mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 06:03:44 +01:00

1711 lines

42 KiB

C

Raw Normal View History

Make git-pack-objects a builtin Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-03 17:24:36 +02:00			`#include "builtin.h"`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`#include "cache.h"`
			`#include "object.h"`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`#include "blob.h"`
			`#include "commit.h"`
			`#include "tag.h"`
			`#include "tree.h"`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`#include "delta.h"`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`#include "pack.h"`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`#include "csum-file.h"`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`#include "tree-walk.h"`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`#include "diff.h"`
			`#include "revision.h"`
			`#include "list-objects.h"`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
git-pack-objects progress flag documentation and cleanup This adds documentation for --progress and --all-progress, remove a duplicate --progress handling and make usage string more readable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 16:51:23 +01:00			`static const char pack_usage[] = "\`
			`git-pack-objects [{ -q \| --progress \| --all-progress }] \n\`
			`[--local] [--incremental] [--window=N] [--depth=N] \n\`
			`[--no-reuse-delta] [--delta-base-offset] [--non-empty] \n\`
Teach git-repack to preserve objects referred to by reflog entries. This adds a new option --reflog to pack-objects and revision machinery; do not bother documenting it for now, since this is only useful for local repacking. When the option is passed, objects reachable from reflog entries are marked as interesting while computing the set of objects to pack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-19 02:25:28 +01:00			`[--revs [--unpacked \| --all]*] [--reflog] [--stdout \| base-name] \n\`
git-pack-objects progress flag documentation and cleanup This adds documentation for --progress and --all-progress, remove a duplicate --progress handling and make usage string more readable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 16:51:23 +01:00			`[<ref-list \| <object-list]";`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`struct object_entry {`
			`unsigned char sha1[20];`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned long size; /* uncompressed size */`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`unsigned long offset; /* offset into the final pack file;`
			`* nonzero if already written.`
			`*/`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned int depth; /* delta depth */`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`unsigned int delta_limit; /* base adjustment for in-pack delta */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned int hash; /* name hint hash */`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`enum object_type type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`enum object_type in_pack_type; /* could be delta */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned long delta_size; /* delta data size (uncompressed) */`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`#define in_pack_header_size delta_size /* only when reusing pack data */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`struct object_entry delta; / delta base object */`
			`struct packed_git in_pack; / already in pack */`
			`unsigned int in_pack_offset;`
Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-10 07:50:18 +02:00			`struct object_entry delta_child; / deltified objects who bases me */`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`struct object_entry delta_sibling; / other deltified objects who`
			`* uses the same base as me`
			`*/`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`int preferred_base; /* we do not pack this, but is encouraged to`
			`* be used as the base objectto delta huge`
			`* objects against.`
			`*/`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`};`

pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/*`
Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-10 07:50:18 +02:00			`* Objects we are going to pack are collected in objects array (dynamically`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`* expanded). nr_objects & nr_alloc controls this array. They are stored`
			`* in the order we see -- typically rev-list --objects order that gives us`
			`* nice "minimum seek" order.`
			`*`
			`* sorted-by-sha ans sorted-by-type are arrays of pointers that point at`
			`* elements in the objects array. The former is used to build the pack`
			`* index (lists object names in the ascending order to help offset lookup),`
			`* and the latter is used to group similar things together by try_delta()`
			`* heuristics.`
			`*/`

Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes. 2005-07-04 00:34:04 +02:00			`static unsigned char object_list_sha1[20];`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static int non_empty;`
			`static int no_reuse_delta;`
			`static int local;`
			`static int incremental;`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`static int allow_ofs_delta;`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static struct object_entry sorted_by_sha, sorted_by_type;`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static struct object_entry *objects;`
			`static int nr_objects, nr_alloc, nr_result;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static const char *base_name;`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`static unsigned char pack_file_sha1[20];`
Make pack-objects chattier. You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 22:01:54 +01:00			`static int progress = 1;`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static volatile sig_atomic_t progress_update;`
pack-objects: check pack.window for default window size For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-23 07:50:30 +02:00			`static int window = 10;`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`static int pack_to_stdout;`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`static int num_preferred_base;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/*`
			`* The object names in objects array are hashed with this hashtable,`
			`* to help looking up the entry by object name. Binary search from`
			`* sorted_by_sha is also possible but this was easier to code and faster.`
			`* This hashtable is built after all the objects are seen.`
			`*/`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static int *object_ix;`
			`static int object_ix_hashsz;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`/*`
			`* Pack index for existing packs give us easy access to the offsets into`
			`* corresponding pack file where each object's data starts, but the entries`
			`* do not store the size of the compressed representation (uncompressed`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`* size is easily available by examining the pack entry header). It is`
			`* also rather expensive to find the sha1 for an object given its offset.`
			`*`
			`* We build a hashtable of existing packs (pack_revindex), and keep reverse`
			`* index here -- pack index file is sorted by object name mapping to offset;`
			`* this pack_revindex[].revindex array is a list of offset/index_nr pairs`
			`* ordered by offset, so if you know the offset of an object, next offset`
			`* is where its packed representation ends and the index_nr can be used to`
			`* get the object sha1 from the main index.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`struct revindex_entry {`
			`unsigned int offset;`
			`unsigned int nr;`
			`};`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`struct pack_revindex {`
			`struct packed_git *p;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`struct revindex_entry *revindex;`
			`};`
			`static struct pack_revindex *pack_revindex;`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static int pack_revindex_hashsz;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`/*`
			`* stats`
			`*/`
remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-15 19:23:48 +02:00			`static int written;`
			`static int written_delta;`
			`static int reused;`
			`static int reused_delta;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`static int pack_revindex_ix(struct packed_git *p)`
			`{`
Re-fix compilation warnings. Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-02 00:01:53 +01:00			`unsigned long ui = (unsigned long)p;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`int i;`

			`ui = ui ^ (ui >> 16); /* defeat structure alignment */`
			`i = (int)(ui % pack_revindex_hashsz);`
			`while (pack_revindex[i].p) {`
			`if (pack_revindex[i].p == p)`
			`return i;`
			`if (++i == pack_revindex_hashsz)`
			`i = 0;`
			`}`
			`return -1 - i;`
			`}`

			`static void prepare_pack_ix(void)`
			`{`
			`int num;`
			`struct packed_git *p;`
			`for (num = 0, p = packed_git; p; p = p->next)`
			`num++;`
			`if (!num)`
			`return;`
			`pack_revindex_hashsz = num * 11;`
			`pack_revindex = xcalloc(sizeof(*pack_revindex), pack_revindex_hashsz);`
			`for (p = packed_git; p; p = p->next) {`
			`num = pack_revindex_ix(p);`
			`num = - 1 - num;`
			`pack_revindex[num].p = p;`
			`}`
			`/* revindex elements are lazily initialized */`
			`}`

			`static int cmp_offset(const void a_, const void b_)`
			`{`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`const struct revindex_entry *a = a_;`
			`const struct revindex_entry *b = b_;`
			`return (a->offset < b->offset) ? -1 : (a->offset > b->offset) ? 1 : 0;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`

			`/*`
			`* Ordered list of offsets of objects in the pack.`
			`*/`
			`static void prepare_pack_revindex(struct pack_revindex *rix)`
			`{`
			`struct packed_git *p = rix->p;`
			`int num_ent = num_packed_objects(p);`
			`int i;`
			`void *index = p->index_base + 256;`

make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`rix->revindex = xmalloc(sizeof(rix->revindex) (num_ent + 1));`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`for (i = 0; i < num_ent; i++) {`
Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:09 +02:00			`unsigned int hl = ((unsigned int )((char ) index + 24i));`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`rix->revindex[i].offset = ntohl(hl);`
			`rix->revindex[i].nr = i;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
			`/* This knows the pack format -- the 20-byte trailer`
			`* follows immediately after the last object data.`
			`*/`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`rix->revindex[num_ent].offset = p->pack_size - 20;`
			`rix->revindex[num_ent].nr = -1;`
			`qsort(rix->revindex, num_ent, sizeof(*rix->revindex), cmp_offset);`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`

make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`static struct revindex_entry * find_packed_object(struct packed_git *p,`
			`unsigned int ofs)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`{`
			`int num;`
			`int lo, hi;`
			`struct pack_revindex *rix;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`struct revindex_entry *revindex;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`num = pack_revindex_ix(p);`
			`if (num < 0)`
			`die("internal error: pack revindex uninitialized");`
			`rix = &pack_revindex[num];`
			`if (!rix->revindex)`
			`prepare_pack_revindex(rix);`
			`revindex = rix->revindex;`
			`lo = 0;`
			`hi = num_packed_objects(p) + 1;`
			`do {`
			`int mi = (lo + hi) / 2;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`if (revindex[mi].offset == ofs) {`
			`return revindex + mi;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`else if (ofs < revindex[mi].offset)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`hi = mi;`
			`else`
			`lo = mi + 1;`
			`} while (lo < hi);`
			`die("internal error: pack revindex corrupt");`
			`}`

make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`static unsigned long find_packed_object_size(struct packed_git *p,`
			`unsigned long ofs)`
			`{`
			`struct revindex_entry *entry = find_packed_object(p, ofs);`
			`return entry[1].offset - ofs;`
			`}`

			`static unsigned char find_packed_object_name(struct packed_git p,`
			`unsigned long ofs)`
			`{`
			`struct revindex_entry *entry = find_packed_object(p, ofs);`
			`return (unsigned char )(p->index_base + 256) + 24 entry->nr + 4;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void delta_against(void buf, unsigned long size, struct object_entry *entry)`
			`{`
			`unsigned long othersize, delta_size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
			`void *otherbuf = read_sha1_file(entry->delta->sha1, &type, &othersize);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`void *delta_buf;`

			`if (!otherbuf)`
			`die("unable to read %s", sha1_to_hex(entry->delta->sha1));`
[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair. This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-26 13:29:18 +02:00			`delta_buf = diff_delta(otherbuf, othersize,`
[PATCH] assorted delta code cleanup This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 08:49:56 +02:00			`buf, size, &delta_size, 0);`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`if (!delta_buf \|\| delta_size != entry->delta_size)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`die("delta size changed");`
			`free(buf);`
			`free(otherbuf);`
			`return delta_buf;`
			`}`

Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`/*`
			`* The per-object header is a pretty dense thing, which is`
			`* - first byte: low four bits are "size", then three bits of "type",`
			`* and the high bit is "size continues".`
			`* - each byte afterwards: low seven bits are size continuation,`
			`* with the high bit being "size continues"`
			`*/`
			`static int encode_header(enum object_type type, unsigned long size, unsigned char *hdr)`
			`{`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`int n = 1;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned char c;`

introduce delta objects with offset to base This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to OBJ_REF_DELTA to better make the distinction between those two delta objects, and adds support for the handling of those new delta objects in sha1_file.c only. The OBJ_OFS_DELTA contains a relative offset from the delta object's position in a pack instead of the 20-byte SHA1 reference to identify the base object. Since the base is likely to be not so far away, the relative offset is more likely to have a smaller encoding on average than an absolute offset. And for those delta objects the base must always be stored first because there is no way to know the distance of later objects when streaming a pack. Hence this relative offset is always meant to be negative. The offset encoding is slightly denser than the one used for object size -- credits to <linux@horizon.com> (whoever this is) for bringing it to my attention. This allows for pack size reduction between 3.2% (Linux-2.6) to over 5% (linux-historic). Runtime pack access should be faster too since delta replay does skip a search in the pack index for each delta in a chain. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:06:49 +02:00			`if (type < OBJ_COMMIT \|\| type > OBJ_REF_DELTA)`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`die("bad type %d", type);`

Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`c = (type << 4) \| (size & 15);`
			`size >>= 4;`
			`while (size) {`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`*hdr++ = c \| 0x80;`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`c = size & 0x7f;`
			`size >>= 7;`
			`n++;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`}`
			`*hdr = c;`
			`return n;`
			`}`

make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`/*`
			`* we are going to reuse the existing object data as is. make`
			`* sure it is not corrupt.`
			`*/`
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`static int check_pack_inflate(struct packed_git *p,`
			`struct pack_window **w_curs,`
			`unsigned long offset,`
			`unsigned long len,`
			`unsigned long expect)`
			`{`
			`z_stream stream;`
			`unsigned char fakebuf[4096], *in;`
			`int st;`

			`memset(&stream, 0, sizeof(stream));`
			`inflateInit(&stream);`
			`do {`
			`in = use_pack(p, w_curs, offset, &stream.avail_in);`
			`stream.next_in = in;`
			`stream.next_out = fakebuf;`
			`stream.avail_out = sizeof(fakebuf);`
			`st = inflate(&stream, Z_FINISH);`
			`offset += stream.next_in - in;`
			`} while (st == Z_OK \|\| st == Z_BUF_ERROR);`
			`inflateEnd(&stream);`
			`return (st == Z_STREAM_END &&`
			`stream.total_out == expect &&`
			`stream.total_in == len) ? 0 : -1;`
			`}`

			`static void copy_pack_data(struct sha1file *f,`
			`struct packed_git *p,`
			`struct pack_window **w_curs,`
			`unsigned long offset,`
			`unsigned long len)`
			`{`
			`unsigned char *in;`
			`unsigned int avail;`

			`while (len) {`
			`in = use_pack(p, w_curs, offset, &avail);`
			`if (avail > len)`
			`avail = len;`
			`sha1write(f, in, avail);`
			`offset += avail;`
			`len -= avail;`
			`}`
			`}`

			`static int check_loose_inflate(unsigned char *data, unsigned long len, unsigned long expect)`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`{`
more lightweight revalidation while reusing deflated stream in packing When copying from an existing pack and when copying from a loose object with new style header, the code makes sure that the piece we are going to copy out inflates well and inflate() consumes the data in full while doing so. The check to see if the xdelta really apply is quite expensive as you described, because you would need to have the image of the base object which can be represented as a delta against something else. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 06:09:18 +02:00			`z_stream stream;`
			`unsigned char fakebuf[4096];`
			`int st;`

			`memset(&stream, 0, sizeof(stream));`
			`stream.next_in = data;`
			`stream.avail_in = len;`
			`stream.next_out = fakebuf;`
			`stream.avail_out = sizeof(fakebuf);`
			`inflateInit(&stream);`

			`while (1) {`
			`st = inflate(&stream, Z_FINISH);`
			`if (st == Z_STREAM_END \|\| st == Z_OK) {`
			`st = (stream.total_out == expect &&`
			`stream.total_in == len) ? 0 : -1;`
			`break;`
			`}`
			`if (st != Z_BUF_ERROR) {`
			`st = -1;`
			`break;`
			`}`
			`stream.next_out = fakebuf;`
			`stream.avail_out = sizeof(fakebuf);`
			`}`
			`inflateEnd(&stream);`
			`return st;`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`}`

			`static int revalidate_loose_object(struct object_entry *entry,`
			`unsigned char *map,`
			`unsigned long mapsize)`
			`{`
			`/* we already know this is a loose object with new type header. */`
more lightweight revalidation while reusing deflated stream in packing When copying from an existing pack and when copying from a loose object with new style header, the code makes sure that the piece we are going to copy out inflates well and inflate() consumes the data in full while doing so. The check to see if the xdelta really apply is quite expensive as you described, because you would need to have the image of the base object which can be represented as a delta against something else. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 06:09:18 +02:00			`enum object_type type;`
			`unsigned long size, used;`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00
			`if (pack_to_stdout)`
			`return 0;`

more lightweight revalidation while reusing deflated stream in packing When copying from an existing pack and when copying from a loose object with new style header, the code makes sure that the piece we are going to copy out inflates well and inflate() consumes the data in full while doing so. The check to see if the xdelta really apply is quite expensive as you described, because you would need to have the image of the base object which can be represented as a delta against something else. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 06:09:18 +02:00			`used = unpack_object_header_gently(map, mapsize, &type, &size);`
			`if (!used)`
			`return -1;`
			`map += used;`
			`mapsize -= used;`
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`return check_loose_inflate(map, mapsize, size);`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static unsigned long write_object(struct sha1file *f,`
			`struct object_entry *entry)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
			`unsigned long size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`void *buf;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned char header[10];`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`unsigned hdrlen, datalen;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`enum object_type obj_type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`int to_reuse = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`obj_type = entry->type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (! entry->in_pack)`
			`to_reuse = 0; /* can't reuse what we don't have */`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`else if (obj_type == OBJ_REF_DELTA \|\| obj_type == OBJ_OFS_DELTA)`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`to_reuse = 1; /* check_object() decided it for us */`
			`else if (obj_type != entry->in_pack_type)`
			`to_reuse = 0; /* pack has delta which is unusable */`
			`else if (entry->delta)`
			`to_reuse = 0; /* we want to pack afresh */`
			`else`
			`to_reuse = 1; /* we have it in-pack undeltified,`
			`* and we do not need to deltify it.`
			`*/`

pack-objects: reuse deflated data from new-style loose objects. When packing an object without deltifying, if the data is stored in a loose object that is encoded with a new style header, copy it without inflating and deflating. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-18 00:06:23 +02:00			`if (!entry->in_pack && !entry->delta) {`
			`unsigned char *map;`
			`unsigned long mapsize;`
			`map = map_sha1_file(entry->sha1, &mapsize);`
			`if (map && !legacy_loose_object(map)) {`
			`/* We can copy straight into the pack file */`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`if (revalidate_loose_object(entry, map, mapsize))`
			`die("corrupt loose object %s",`
			`sha1_to_hex(entry->sha1));`
pack-objects: reuse deflated data from new-style loose objects. When packing an object without deltifying, if the data is stored in a loose object that is encoded with a new style header, copy it without inflating and deflating. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-18 00:06:23 +02:00			`sha1write(f, map, mapsize);`
			`munmap(map, mapsize);`
			`written++;`
			`reused++;`
			`return mapsize;`
			`}`
			`if (map)`
			`munmap(map, mapsize);`
			`}`

pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`if (!to_reuse) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`buf = read_sha1_file(entry->sha1, &type, &size);`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`if (!buf)`
			`die("unable to read %s", sha1_to_hex(entry->sha1));`
			`if (size != entry->size)`
			`die("object %s size inconsistency (%lu vs %lu)",`
			`sha1_to_hex(entry->sha1), size, entry->size);`
			`if (entry->delta) {`
			`buf = delta_against(buf, size, entry);`
			`size = entry->delta_size;`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`obj_type = (allow_ofs_delta && entry->delta->offset) ?`
			`OBJ_OFS_DELTA : OBJ_REF_DELTA;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
			`/*`
			`* The object header is a byte of 'type' followed by zero or`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`* more bytes of length.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
			`hdrlen = encode_header(obj_type, size, header);`
			`sha1write(f, header, hdrlen);`

make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`if (obj_type == OBJ_OFS_DELTA) {`
			`/*`
			`* Deltas with relative base contain an additional`
			`* encoding of the relative offset for the delta`
			`* base from this object's position in the pack.`
			`*/`
			`unsigned long ofs = entry->offset - entry->delta->offset;`
			`unsigned pos = sizeof(header) - 1;`
			`header[pos] = ofs & 127;`
			`while (ofs >>= 7)`
			`header[--pos] = 128 \| (--ofs & 127);`
			`sha1write(f, header + pos, sizeof(header) - pos);`
			`hdrlen += sizeof(header) - pos;`
			`} else if (obj_type == OBJ_REF_DELTA) {`
			`/*`
			`* Deltas with a base reference contain`
			`* an additional 20 bytes for the base sha1.`
			`*/`
			`sha1write(f, entry->delta->sha1, 20);`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`hdrlen += 20;`
			`}`
			`datalen = sha1write_compressed(f, buf, size);`
			`free(buf);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`else {`
			`struct packed_git *p = entry->in_pack;`
Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:08 +01:00			`struct pack_window *w_curs = NULL;`
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`unsigned long offset;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`if (entry->delta) {`
			`obj_type = (allow_ofs_delta && entry->delta->offset) ?`
			`OBJ_OFS_DELTA : OBJ_REF_DELTA;`
			`reused_delta++;`
			`}`
			`hdrlen = encode_header(obj_type, entry->size, header);`
			`sha1write(f, header, hdrlen);`
			`if (obj_type == OBJ_OFS_DELTA) {`
			`unsigned long ofs = entry->offset - entry->delta->offset;`
			`unsigned pos = sizeof(header) - 1;`
			`header[pos] = ofs & 127;`
			`while (ofs >>= 7)`
			`header[--pos] = 128 \| (--ofs & 127);`
			`sha1write(f, header + pos, sizeof(header) - pos);`
			`hdrlen += sizeof(header) - pos;`
			`} else if (obj_type == OBJ_REF_DELTA) {`
			`sha1write(f, entry->delta->sha1, 20);`
			`hdrlen += 20;`
			`}`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`offset = entry->in_pack_offset + entry->in_pack_header_size;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`datalen = find_packed_object_size(p, entry->in_pack_offset)`
			`- entry->in_pack_header_size;`
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`if (!pack_to_stdout && check_pack_inflate(p, &w_curs,`
			`offset, datalen, entry->size))`
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`die("corrupt delta in pack %s", sha1_to_hex(entry->sha1));`
Loop over pack_windows when inflating/accessing data. When multiple mmaps start getting used for all pack file access it is not possible to get all data associated with a specific object in one contiguous memory region. This limitation prevents simply passing a single address and length to SHA1_Update or to inflate. Instead we need to loop until we have processed all data of interest. As we loop over the data we are always interested in reusing the same window 'cursor', as the prior window will no longer be of any use to us. This allows the use_pack() call to automatically decrement the use count of the prior window before setting up access for us to the next window. Within each loop we need to make use of the available length output parameter of use_pack() to tell us how many bytes are available in the current memory region, as we cannot tell otherwise. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:13 +01:00			`copy_pack_data(f, p, &w_curs, offset, datalen);`
Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:08 +01:00			`unuse_pack(&w_curs);`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`reused++;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`}`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`if (entry->delta)`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`written_delta++;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`written++;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return hdrlen + datalen;`
			`}`

[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`static unsigned long write_one(struct sha1file *f,`
			`struct object_entry *e,`
			`unsigned long offset)`
			`{`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`if (e->offset \|\| e->preferred_base)`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`/* offset starts from header size and cannot be zero`
			`* if it is written already.`
			`*/`
			`return offset;`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`/* if we are deltified, write out its base object first. */`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`if (e->delta)`
			`offset = write_one(f, e->delta, offset);`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`e->offset = offset;`
			`return offset + write_object(f, e);`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void write_pack_file(void)`
			`{`
			`int i;`
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`struct sha1file *f;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned long offset;`
			`struct pack_header hdr;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`unsigned last_percent = 999;`
make git-push a bit more verbose Currently git-push displays progress status for the local packing of objects to send, but nothing once it starts to push it over the connection. Having progress status in that later case is especially nice when pushing lots of objects over a slow network link. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 22:58:32 +01:00			`int do_progress = progress;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
make git-push a bit more verbose Currently git-push displays progress status for the local packing of objects to send, but nothing once it starts to push it over the connection. Having progress status in that later case is especially nice when pushing lots of objects over a slow network link. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 22:58:32 +01:00			`if (!base_name) {`
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`f = sha1fd(1, "<stdout>");`
make git-push a bit more verbose Currently git-push displays progress status for the local packing of objects to send, but nothing once it starts to push it over the connection. Having progress status in that later case is especially nice when pushing lots of objects over a slow network link. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 22:58:32 +01:00			`do_progress >>= 1;`
			`}`
			`else`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`f = sha1create("%s-%s.%s", base_name,`
			`sha1_to_hex(object_list_sha1), "pack");`
			`if (do_progress)`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`fprintf(stderr, "Writing %d objects.\n", nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`hdr.hdr_signature = htonl(PACK_SIGNATURE);`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`hdr.hdr_version = htonl(PACK_VERSION);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`hdr.hdr_entries = htonl(nr_result);`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`sha1write(f, &hdr, sizeof(hdr));`
			`offset = sizeof(hdr);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (!nr_result)`
			`goto done;`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`for (i = 0; i < nr_objects; i++) {`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`offset = write_one(f, objects + i, offset);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (do_progress) {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`unsigned percent = written * 100 / nr_result;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress_update \|\| percent != last_percent) {`
			`fprintf(stderr, "%4u%% (%u/%u) done\r",`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`percent, written, nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`progress_update = 0;`
			`last_percent = percent;`
			`}`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`}`
			`}`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (do_progress)`
			`fputc('\n', stderr);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`done:`
pack-objects: remove redundent status information The final 'nr_result' and 'written' values must always be the same otherwise we're in deep trouble. So let's remove a redundent report. And for paranoia sake let's make sure those two variables are actually equal after all objects are written (one never knows). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-29 23:15:48 +01:00			`if (written != nr_result)`
			`die("wrote %d objects while expecting %d", written, nr_result);`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`sha1close(f, pack_file_sha1, 1);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

			`static void write_index_file(void)`
			`{`
			`int i;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct sha1file *f = sha1create("%s-%s.%s", base_name,`
			`sha1_to_hex(object_list_sha1), "idx");`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry **list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct object_entry **last = list + nr_result;`
Use fixed-size integers for .idx file I/O This attempts to finish what Simon started in the previous commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-18 08:17:28 +01:00			`uint32_t array[256];`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`/*`
			`* Write the first-level table (the list is sorted,`
			`* but we use a 256-entry lookup to be able to avoid`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`* having to do eight extra binary search iterations).`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`*/`
			`for (i = 0; i < 256; i++) {`
			`struct object_entry **next = list;`
			`while (next < last) {`
			`struct object_entry entry = next;`
			`if (entry->sha1[0] != i)`
			`break;`
			`next++;`
			`}`
			`array[i] = htonl(next - sorted_by_sha);`
			`list = next;`
			`}`
Use fixed-size integers for .idx file I/O This attempts to finish what Simon started in the previous commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-18 08:17:28 +01:00			`sha1write(f, array, 256 * 4);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`/*`
			`* Write the actual SHA1 entries..`
			`*/`
			`list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`for (i = 0; i < nr_result; i++) {`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry entry = list++;`
Use fixed-size integers for .idx file I/O This attempts to finish what Simon started in the previous commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-18 08:17:28 +01:00			`uint32_t offset = htonl(entry->offset);`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`sha1write(f, &offset, 4);`
			`sha1write(f, entry->sha1, 20);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`sha1write(f, pack_file_sha1, 20);`
			`sha1close(f, NULL, 1);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static int locate_object_entry_hash(const unsigned char *sha1)`
			`{`
			`int i;`
			`unsigned int ui;`
			`memcpy(&ui, sha1, sizeof(unsigned int));`
			`i = ui % object_ix_hashsz;`
			`while (0 < object_ix[i]) {`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`if (!hashcmp(sha1, objects[object_ix[i] - 1].sha1))`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`return i;`
			`if (++i == object_ix_hashsz)`
			`i = 0;`
			`}`
			`return -1 - i;`
			`}`

			`static struct object_entry locate_object_entry(const unsigned char sha1)`
			`{`
			`int i;`

			`if (!object_ix_hashsz)`
			`return NULL;`

			`i = locate_object_entry_hash(sha1);`
			`if (0 <= i)`
			`return &objects[object_ix[i]-1];`
			`return NULL;`
			`}`

			`static void rehash_objects(void)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`int i;`
			`struct object_entry *oe;`

			`object_ix_hashsz = nr_objects * 3;`
			`if (object_ix_hashsz < 1024)`
			`object_ix_hashsz = 1024;`
			`object_ix = xrealloc(object_ix, sizeof(int) * object_ix_hashsz);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`memset(object_ix, 0, sizeof(int) * object_ix_hashsz);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`for (i = 0, oe = objects; i < nr_objects; i++, oe++) {`
			`int ix = locate_object_entry_hash(oe->sha1);`
			`if (0 <= ix)`
			`continue;`
			`ix = -1 - ix;`
			`object_ix[ix] = i + 1;`
			`}`
			`}`

pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`static unsigned name_hash(const char *name)`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`{`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`unsigned char c;`
			`unsigned hash = 0;`

pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`/*`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`* This effectively just creates a sortable number from the`
			`* last sixteen non-whitespace characters. Last characters`
			`* count "most", so things that end in ".c" sort together.`
pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`*/`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`while ((c = *name++) != 0) {`
			`if (isspace(c))`
			`continue;`
			`hash = (hash >> 2) + (c << 24);`
			`}`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`return hash;`
			`}`

			`static int add_object_entry(const unsigned char *sha1, unsigned hash, int exclude)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
			`unsigned int idx = nr_objects;`
			`struct object_entry *entry;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`struct packed_git *p;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`unsigned int found_offset = 0;`
			`struct packed_git *found_pack = NULL;`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`int ix, status = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (!exclude) {`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`for (p = packed_git; p; p = p->next) {`
many cleanups to sha1_file.c Those cleanups are mainly to set the table for the support of deltas with base objects referenced by offsets instead of sha1. This means that many pack lookup functions are converted to take a pack/offset tuple instead of a sha1. This eliminates many struct pack_entry usages since this structure carried redundent information in many cases, and it increased stack footprint needlessly for a couple recursively called functions that used to declare a local copy of it for every recursion loop. In the process, packed_object_info_detail() has been reorganized as well so to look much saner and more amenable to deltas with offset support. Finally the appropriate adjustments have been made to functions that depend on the above changes. But there is no functionality changes yet simply some code refactoring at this point. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:05:37 +02:00			`unsigned long offset = find_pack_entry_one(sha1, p);`
			`if (offset) {`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`if (incremental)`
			`return 0;`
			`if (local && !p->pack_local)`
			`return 0;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (!found_pack) {`
many cleanups to sha1_file.c Those cleanups are mainly to set the table for the support of deltas with base objects referenced by offsets instead of sha1. This means that many pack lookup functions are converted to take a pack/offset tuple instead of a sha1. This eliminates many struct pack_entry usages since this structure carried redundent information in many cases, and it increased stack footprint needlessly for a couple recursively called functions that used to declare a local copy of it for every recursion loop. In the process, packed_object_info_detail() has been reorganized as well so to look much saner and more amenable to deltas with offset support. Finally the appropriate adjustments have been made to functions that depend on the above changes. But there is no functionality changes yet simply some code refactoring at this point. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:05:37 +02:00			`found_offset = offset;`
			`found_pack = p;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`}`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`}`
			`}`
			`}`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if ((entry = locate_object_entry(sha1)) != NULL)`
			`goto already_added;`
Add "--incremental" flag to git-pack-objects It won't add an object that is already in a pack to the new pack. 2005-07-03 22:08:40 +02:00
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (idx >= nr_alloc) {`
			`unsigned int needed = (idx + 1024) * 3 / 2;`
			`objects = xrealloc(objects, needed * sizeof(*entry));`
			`nr_alloc = needed;`
			`}`
			`entry = objects + idx;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`nr_objects = idx + 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`memset(entry, 0, sizeof(*entry));`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(entry->sha1, sha1);`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`entry->hash = hash;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
			`if (object_ix_hashsz * 3 <= nr_objects * 4)`
			`rehash_objects();`
			`else {`
			`ix = locate_object_entry_hash(entry->sha1);`
			`if (0 <= ix)`
			`die("internal error in object hashing.");`
			`object_ix[-1 - ix] = idx + 1;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`status = 1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
			`already_added:`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (progress_update) {`
			`fprintf(stderr, "Counting objects...%d\r", nr_objects);`
			`progress_update = 0;`
			`}`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (exclude)`
			`entry->preferred_base = 1;`
			`else {`
			`if (found_pack) {`
			`entry->in_pack = found_pack;`
			`entry->in_pack_offset = found_offset;`
			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`return status;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`struct pbase_tree_cache {`
			`unsigned char sha1[20];`
			`int ref;`
			`int temporary;`
			`void *tree_data;`
			`unsigned long tree_size;`
			`};`

			`static struct pbase_tree_cache *(pbase_tree_cache[256]);`
			`static int pbase_tree_cache_ix(const unsigned char *sha1)`
			`{`
			`return sha1[0] % ARRAY_SIZE(pbase_tree_cache);`
			`}`
			`static int pbase_tree_cache_ix_incr(int ix)`
			`{`
			`return (ix+1) % ARRAY_SIZE(pbase_tree_cache);`
			`}`

			`static struct pbase_tree {`
			`struct pbase_tree *next;`
			`/* This is a phony "cache" entry; we are not`
			`* going to evict it nor find it through _get()`
			`* mechanism -- this is for the toplevel node that`
			`* would almost always change with any commit.`
			`*/`
			`struct pbase_tree_cache pcache;`
			`} *pbase_tree;`

			`static struct pbase_tree_cache pbase_tree_get(const unsigned char sha1)`
			`{`
			`struct pbase_tree_cache ent, nent;`
			`void *data;`
			`unsigned long size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`int neigh;`
			`int my_ix = pbase_tree_cache_ix(sha1);`
			`int available_ix = -1;`

			`/* pbase-tree-cache acts as a limited hashtable.`
			`* your object will be found at your index or within a few`
			`* slots after that slot if it is cached.`
			`*/`
			`for (neigh = 0; neigh < 8; neigh++) {`
			`ent = pbase_tree_cache[my_ix];`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`if (ent && !hashcmp(ent->sha1, sha1)) {`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`ent->ref++;`
			`return ent;`
			`}`
			`else if (((available_ix < 0) && (!ent \|\| !ent->ref)) \|\|`
			`((0 <= available_ix) &&`
			`(!ent && pbase_tree_cache[available_ix])))`
			`available_ix = my_ix;`
			`if (!ent)`
			`break;`
			`my_ix = pbase_tree_cache_ix_incr(my_ix);`
			`}`

			`/* Did not find one. Either we got a bogus request or`
			`* we need to read and perhaps cache.`
			`*/`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`data = read_sha1_file(sha1, &type, &size);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`if (!data)`
			`return NULL;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`if (type != OBJ_TREE) {`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`free(data);`
			`return NULL;`
			`}`

			`/* We need to either cache or return a throwaway copy */`

			`if (available_ix < 0)`
			`ent = NULL;`
			`else {`
			`ent = pbase_tree_cache[available_ix];`
			`my_ix = available_ix;`
			`}`

			`if (!ent) {`
			`nent = xmalloc(sizeof(*nent));`
			`nent->temporary = (available_ix < 0);`
			`}`
			`else {`
			`/* evict and reuse */`
			`free(ent->tree_data);`
			`nent = ent;`
			`}`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(nent->sha1, sha1);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`nent->tree_data = data;`
			`nent->tree_size = size;`
			`nent->ref = 1;`
			`if (!nent->temporary)`
			`pbase_tree_cache[my_ix] = nent;`
			`return nent;`
			`}`

			`static void pbase_tree_put(struct pbase_tree_cache *cache)`
			`{`
			`if (!cache->temporary) {`
			`cache->ref--;`
			`return;`
			`}`
			`free(cache->tree_data);`
			`free(cache);`
			`}`

			`static int name_cmp_len(const char *name)`
			`{`
			`int i;`
			`for (i = 0; name[i] && name[i] != '\n' && name[i] != '/'; i++)`
			`;`
			`return i;`
			`}`

			`static void add_pbase_object(struct tree_desc *tree,`
			`const char *name,`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`int cmplen,`
			`const char *fullname)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`{`
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`struct name_entry entry;`

			`while (tree_entry(tree,&entry)) {`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`unsigned long size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`if (entry.pathlen != cmplen \|\|`
			`memcmp(entry.path, name, cmplen) \|\|`
			`!has_sha1_file(entry.sha1) \|\|`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`(type = sha1_object_info(entry.sha1, &size)) < 0)`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`continue;`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`if (name[cmplen] != '/') {`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`unsigned hash = name_hash(fullname);`
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`add_object_entry(entry.sha1, hash, 1);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`return;`
			`}`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`if (type == OBJ_TREE) {`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct tree_desc sub;`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`struct pbase_tree_cache *tree;`
			`const char *down = name+cmplen+1;`
			`int downlen = name_cmp_len(down);`

tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`tree = pbase_tree_get(entry.sha1);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`if (!tree)`
			`return;`
			`sub.buf = tree->tree_data;`
			`sub.size = tree->tree_size;`

pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`add_pbase_object(&sub, down, downlen, fullname);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`pbase_tree_put(tree);`
			`}`
			`}`
			`}`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`static unsigned *done_pbase_paths;`
			`static int done_pbase_paths_num;`
			`static int done_pbase_paths_alloc;`
			`static int done_pbase_path_pos(unsigned hash)`
			`{`
			`int lo = 0;`
			`int hi = done_pbase_paths_num;`
			`while (lo < hi) {`
			`int mi = (hi + lo) / 2;`
			`if (done_pbase_paths[mi] == hash)`
			`return mi;`
			`if (done_pbase_paths[mi] < hash)`
			`hi = mi;`
			`else`
			`lo = mi + 1;`
			`}`
			`return -lo-1;`
			`}`

			`static int check_pbase_path(unsigned hash)`
			`{`
			`int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash);`
			`if (0 <= pos)`
			`return 1;`
			`pos = -pos - 1;`
			`if (done_pbase_paths_alloc <= done_pbase_paths_num) {`
			`done_pbase_paths_alloc = alloc_nr(done_pbase_paths_alloc);`
			`done_pbase_paths = xrealloc(done_pbase_paths,`
			`done_pbase_paths_alloc *`
			`sizeof(unsigned));`
			`}`
			`done_pbase_paths_num++;`
			`if (pos < done_pbase_paths_num)`
			`memmove(done_pbase_paths + pos + 1,`
			`done_pbase_paths + pos,`
			`(done_pbase_paths_num - pos - 1) * sizeof(unsigned));`
			`done_pbase_paths[pos] = hash;`
			`return 0;`
			`}`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`static void add_preferred_base_object(const char *name, unsigned hash)`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`{`
			`struct pbase_tree *it;`
			`int cmplen = name_cmp_len(name);`

			`if (check_pbase_path(hash))`
			`return;`

			`for (it = pbase_tree; it; it = it->next) {`
			`if (cmplen == 0) {`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`hash = name_hash("");`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`add_object_entry(it->pcache.sha1, hash, 1);`
			`}`
			`else {`
			`struct tree_desc tree;`
			`tree.buf = it->pcache.tree_data;`
			`tree.size = it->pcache.tree_size;`
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`add_pbase_object(&tree, name, cmplen, name);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static void add_preferred_base(unsigned char *sha1)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`{`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`struct pbase_tree *it;`
			`void *data;`
			`unsigned long size;`
			`unsigned char tree_sha1[20];`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`if (window <= num_preferred_base++)`
			`return;`

Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`data = read_object_with_reference(sha1, tree_type, &size, tree_sha1);`
			`if (!data)`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`return;`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00
			`for (it = pbase_tree; it; it = it->next) {`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`if (!hashcmp(it->pcache.sha1, tree_sha1)) {`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`free(data);`
			`return;`
			`}`
			`}`

			`it = xcalloc(1, sizeof(*it));`
			`it->next = pbase_tree;`
			`pbase_tree = it;`

Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(it->pcache.sha1, tree_sha1);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`it->pcache.tree_data = data;`
			`it->pcache.tree_size = size;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void check_object(struct object_entry *entry)`
			`{`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (entry->in_pack && !entry->preferred_base) {`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`struct packed_git *p = entry->in_pack;`
Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:08 +01:00			`struct pack_window *w_curs = NULL;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`unsigned long left = p->pack_size - entry->in_pack_offset;`
			`unsigned long size, used;`
			`unsigned char *buf;`
			`struct object_entry *base_entry = NULL;`

Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:08 +01:00			`buf = use_pack(p, &w_curs, entry->in_pack_offset, NULL);`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
			`/* We want in_pack_type even if we do not reuse delta.`
			`* There is no point not reusing non-delta representations.`
			`*/`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`used = unpack_object_header_gently(buf, left,`
			`&entry->in_pack_type, &size);`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/* Check if it is delta, and the base is also an object`
			`* we are going to pack. If so we will reuse the existing`
			`* delta.`
			`*/`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`if (!no_reuse_delta) {`
			`unsigned char c, *base_name;`
			`unsigned long ofs;`
pack-objects: fix use of use_pack(). The code calls use_pack() to make that the variably encoded offset fits in the mmap'ed window, but it forgot that the operation gives the pointer to the beginning of the asked region. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-29 07:29:06 +01:00			`unsigned long used_0;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`/* there is at least 20 bytes left in the pack */`
			`switch (entry->in_pack_type) {`
			`case OBJ_REF_DELTA:`
Fix random segfaults in pack-objects. Junio noticed that 'non-trivial' pushes were failing if executed using the sliding window mmap changes. This was somewhat difficult to track down as the failure was appearing randomly. It turns out this was a failure caused by the delta base reference (either ref or offset format) spanning over the end of a mmap window. The error in pack-objects was we were not recalling use_pack after the object header was unpacked, and therefore we did not get the promise of at least 20 bytes in the buffer for the delta base parsing. This would case later memcmp() calls to walk into unassigned address space at the end of the window. The reason Junio and I had hard time tracking this down in current Git repositories is we were both probably packing with offset deltas, which minimized the odds of the delta base reference spanning over the end of the mmap window. Stepping back and repacking with version 1.3.3 (which only supported reference deltas) increased the likelyhood of seeing the bug. The correct technique (as used in sha1_file.c) is to invoke use_pack() after unpack_object_header_gently to ensure we have enough data available for the delta base decoding. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-27 08:46:23 +01:00			`base_name = use_pack(p, &w_curs,`
			`entry->in_pack_offset + used, NULL);`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`used += 20;`
			`break;`
			`case OBJ_OFS_DELTA:`
Fix random segfaults in pack-objects. Junio noticed that 'non-trivial' pushes were failing if executed using the sliding window mmap changes. This was somewhat difficult to track down as the failure was appearing randomly. It turns out this was a failure caused by the delta base reference (either ref or offset format) spanning over the end of a mmap window. The error in pack-objects was we were not recalling use_pack after the object header was unpacked, and therefore we did not get the promise of at least 20 bytes in the buffer for the delta base parsing. This would case later memcmp() calls to walk into unassigned address space at the end of the window. The reason Junio and I had hard time tracking this down in current Git repositories is we were both probably packing with offset deltas, which minimized the odds of the delta base reference spanning over the end of the mmap window. Stepping back and repacking with version 1.3.3 (which only supported reference deltas) increased the likelyhood of seeing the bug. The correct technique (as used in sha1_file.c) is to invoke use_pack() after unpack_object_header_gently to ensure we have enough data available for the delta base decoding. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-27 08:46:23 +01:00			`buf = use_pack(p, &w_curs,`
			`entry->in_pack_offset + used, NULL);`
pack-objects: fix use of use_pack(). The code calls use_pack() to make that the variably encoded offset fits in the mmap'ed window, but it forgot that the operation gives the pointer to the beginning of the asked region. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-29 07:29:06 +01:00			`used_0 = 0;`
			`c = buf[used_0++];`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`ofs = c & 127;`
			`while (c & 128) {`
			`ofs += 1;`
			`if (!ofs \|\| ofs & ~(~0UL >> 7))`
			`die("delta base offset overflow in pack for %s",`
			`sha1_to_hex(entry->sha1));`
pack-objects: fix use of use_pack(). The code calls use_pack() to make that the variably encoded offset fits in the mmap'ed window, but it forgot that the operation gives the pointer to the beginning of the asked region. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-29 07:29:06 +01:00			`c = buf[used_0++];`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`ofs = (ofs << 7) + (c & 127);`
			`}`
			`if (ofs >= entry->in_pack_offset)`
			`die("delta base offset out of bound for %s",`
			`sha1_to_hex(entry->sha1));`
			`ofs = entry->in_pack_offset - ofs;`
			`base_name = find_packed_object_name(p, ofs);`
pack-objects: fix use of use_pack(). The code calls use_pack() to make that the variably encoded offset fits in the mmap'ed window, but it forgot that the operation gives the pointer to the beginning of the asked region. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-29 07:29:06 +01:00			`used += used_0;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`break;`
			`default:`
			`base_name = NULL;`
			`}`
			`if (base_name)`
			`base_entry = locate_object_entry(base_name);`
			`}`
Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-23 08:34:08 +01:00			`unuse_pack(&w_curs);`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`entry->in_pack_header_size = used;`

allow delta data reuse even if base object is a preferred base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-27 21:42:16 +02:00			`if (base_entry) {`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
			`/* Depth value does not matter - find_deltas()`
			`* will never consider reused delta as the`
			`* base object to deltify other objects`
			`* against, in order to avoid circular deltas.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
			`/* uncompressed size of the delta data */`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`entry->size = size;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`entry->delta = base_entry;`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`entry->type = entry->in_pack_type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`entry->delta_sibling = base_entry->delta_child;`
			`base_entry->delta_child = entry;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`return;`
			`}`
			`/* Otherwise we would do the usual */`
[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:34:06 +02:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`entry->type = sha1_object_info(entry->sha1, &entry->size);`
			`if (entry->type < 0)`
[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:34:06 +02:00			`die("unable to get type of object %s",`
			`sha1_to_hex(entry->sha1));`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`

pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)`
			`{`
			`struct object_entry *child = me->delta_child;`
			`unsigned int m = n;`
			`while (child) {`
			`unsigned int c = check_delta_limit(child, n + 1);`
			`if (m < c)`
			`m = c;`
			`child = child->delta_sibling;`
			`}`
			`return m;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void get_object_details(void)`
			`{`
			`int i;`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`struct object_entry *entry;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`prepare_pack_ix();`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`for (i = 0, entry = objects; i < nr_objects; i++, entry++)`
			`check_object(entry);`
pack-objects: allow "thin" packs to exceed depth limits When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:04:52 +01:00
			`if (nr_objects == nr_result) {`
			`/*`
			`* Depth of objects that depend on the entry -- this`
			`* is subtracted from depth-max to break too deep`
			`* delta chain because of delta data reusing.`
			`* However, we loosen this restriction when we know we`
			`* are creating a thin pack -- it will have to be`
			`* expanded on the other end anyway, so do not`
			`* artificially cut the delta chain and let it go as`
			`* deep as it wants.`
			`*/`
			`for (i = 0, entry = objects; i < nr_objects; i++, entry++)`
			`if (!entry->delta && entry->delta_child)`
			`entry->delta_limit =`
			`check_delta_limit(entry, 1);`
			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

			`typedef int (entry_sort_t)(const struct object_entry , const struct object_entry *);`

			`static entry_sort_t current_sort;`

			`static int sort_comparator(const void _a, const void _b)`
			`{`
			`struct object_entry a = (struct object_entry **)_a;`
			`struct object_entry b = (struct object_entry **)_b;`
			`return current_sort(a,b);`
			`}`

			`static struct object_entry **create_sorted_list(entry_sort_t sort)`
			`{`
			`struct object_entry *list = xmalloc(nr_objects sizeof(struct object_entry *));`
			`int i;`

			`for (i = 0; i < nr_objects; i++)`
			`list[i] = objects + i;`
			`current_sort = sort;`
			`qsort(list, nr_objects, sizeof(struct object_entry *), sort_comparator);`
			`return list;`
			`}`

			`static int sha1_sort(const struct object_entry a, const struct object_entry b)`
			`{`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`return hashcmp(a->sha1, b->sha1);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

Use setenv(), fix warnings - Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 16:13:46 +01:00			`static struct object_entry **create_final_object_list(void)`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`{`
			`struct object_entry **list;`
			`int i, j;`

			`for (i = nr_result = 0; i < nr_objects; i++)`
			`if (!objects[i].preferred_base)`
			`nr_result++;`
			`list = xmalloc(nr_result * sizeof(struct object_entry *));`
			`for (i = j = 0; i < nr_objects; i++) {`
			`if (!objects[i].preferred_base)`
			`list[j++] = objects + i;`
			`}`
			`current_sort = sha1_sort;`
			`qsort(list, nr_result, sizeof(struct object_entry *), sort_comparator);`
			`return list;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static int type_size_sort(const struct object_entry a, const struct object_entry b)`
			`{`
			`if (a->type < b->type)`
			`return -1;`
			`if (a->type > b->type)`
			`return 1;`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`if (a->hash < b->hash)`
			`return -1;`
			`if (a->hash > b->hash)`
			`return 1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (a->preferred_base < b->preferred_base)`
			`return -1;`
			`if (a->preferred_base > b->preferred_base)`
			`return 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (a->size < b->size)`
			`return -1;`
			`if (a->size > b->size)`
			`return 1;`
			`return a < b ? -1 : (a > b);`
			`}`

			`struct unpacked {`
			`struct object_entry *entry;`
			`void *data;`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`struct delta_index *index;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`};`

			`/*`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`* We search for deltas _backwards_ in a list sorted by type and`
			`* by size, so that we see progressively smaller and smaller files.`
			`* That's because we prefer deltas to be from the bigger file`
			`* to the smaller - deletes are potentially cheaper, but perhaps`
			`* more importantly, the bigger file is likely the more recent`
			`* one.`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`*/`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`static int try_delta(struct unpacked trg, struct unpacked src,`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`unsigned max_depth)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`struct object_entry *trg_entry = trg->entry;`
			`struct object_entry *src_entry = src->entry;`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`unsigned long trg_size, src_size, delta_size, sizediff, max_size, sz;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`void *delta_buf;`

			`/* Don't bother doing diffs between different types */`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`if (trg_entry->type != src_entry->type)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return -1;`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`/* We do not compute delta to create objects we are not`
			`* going to pack.`
			`*/`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`if (trg_entry->preferred_base)`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`return -1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
Do not try futile object pairs when repacking. In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 23:04:01 +02:00			`/*`
			`* We do not bother to try a delta that we discarded`
consider previous pack undeltified object state only when reusing delta data Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 05:44:52 +02:00			`* on an earlier try, but only when reusing delta data.`
Do not try futile object pairs when repacking. In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 23:04:01 +02:00			`*/`
consider previous pack undeltified object state only when reusing delta data Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 05:44:52 +02:00			`if (!no_reuse_delta && trg_entry->in_pack &&`
pack-objects: tweak "do not even attempt delta" heuristics The heuristics to give up deltification when both the source and the target are both in the same pack affects negatively when we are repacking the subset of objects in the existing pack. This caused any incremental updates to use suboptimal packs. Tweak the heuristics to avoid this problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-15 07:18:31 +01:00			`trg_entry->in_pack == src_entry->in_pack &&`
			`trg_entry->in_pack_type != OBJ_REF_DELTA &&`
			`trg_entry->in_pack_type != OBJ_OFS_DELTA)`
Do not try futile object pairs when repacking. In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 23:04:01 +02:00			`return 0;`

use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`/*`
			`* If the current object is at pack edge, take the depth the`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`* objects that depend on the current object into account --`
			`* otherwise they would become too deep.`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`*/`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`if (trg_entry->delta_child) {`
			`if (max_depth <= trg_entry->delta_limit)`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`return 0;`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`max_depth -= trg_entry->delta_limit;`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`}`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`if (src_entry->depth >= max_depth)`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
improve depth heuristic for maximum delta size This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-16 22:29:14 +02:00			`/* Now some size filtering heuristics. */`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`trg_size = trg_entry->size;`
			`max_size = trg_size/2 - 20;`
improve depth heuristic for maximum delta size This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-16 22:29:14 +02:00			`max_size = max_size * (max_depth - src_entry->depth) / max_depth;`
			`if (max_size == 0)`
			`return 0;`
simple euristic for further free packing improvements Given that the early eviction of objects with maximum delta depth may exhibit bad packing on its own, why not considering a bias against deep base objects in try_delta() to mitigate that bad behavior. This patch adjust the MAX_size allowed for a delta based on the depth of the base object as well as enabling the early eviction of max depth objects from the object window. When used separately, those two things produce slightly better and much worse results respectively. But their combined effect is a surprising significant packing improvement. With this really simple patch the GIT repo gets nearly 15% smaller, and the Linux kernel repo about 5% smaller, with no significantly measurable CPU usage difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-15 17:40:05 +02:00			`if (trg_entry->delta && trg_entry->delta_size <= max_size)`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`max_size = trg_entry->delta_size-1;`
			`src_size = src_entry->size;`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`sizediff = src_size < trg_size ? trg_size - src_size : 0;`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`if (sizediff >= max_size)`
pack-objects: do not stop at object that is "too small" Because we sort the delta window by name-hash and then size, hitting an object that is too small to consider as a delta base for the current object does not mean we do not have better candidate in the window beyond it. Noticed by Shawn Pearce, analyzed by Nico, Linus and me. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-21 08:36:22 +02:00			`return 0;`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`/* Load data if not already done */`
			`if (!trg->data) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`trg->data = read_sha1_file(trg_entry->sha1, &type, &sz);`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`if (sz != trg_size)`
			`die("object %s inconsistent object length (%lu vs %lu)",`
			`sha1_to_hex(trg_entry->sha1), sz, trg_size);`
			`}`
			`if (!src->data) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`src->data = read_sha1_file(src_entry->sha1, &type, &sz);`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`if (sz != src_size)`
			`die("object %s inconsistent object length (%lu vs %lu)",`
			`sha1_to_hex(src_entry->sha1), sz, src_size);`
			`}`
			`if (!src->index) {`
			`src->index = create_delta_index(src->data, src_size);`
			`if (!src->index)`
			`die("out of memory");`
			`}`

			`delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (!delta_buf)`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`return 0;`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00
			`trg_entry->delta = src_entry;`
			`trg_entry->delta_size = delta_size;`
			`trg_entry->depth = src_entry->depth + 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`free(delta_buf);`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`return 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`static void progress_interval(int signum)`
			`{`
			`progress_update = 1;`
			`}`

Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`static void find_deltas(struct object_entry **list, int window, int depth)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`int i, idx;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`unsigned int array_size = window * sizeof(struct unpacked);`
			`struct unpacked *array = xmalloc(array_size);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`unsigned processed = 0;`
			`unsigned last_percent = 999;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`memset(array, 0, array_size);`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`i = nr_objects;`
			`idx = 0;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress)`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`fprintf(stderr, "Deltifying %d objects.\n", nr_result);`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`while (--i >= 0) {`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry *entry = list[i];`
			`struct unpacked *n = array + idx;`
			`int j;`

Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (!entry->preferred_base)`
			`processed++;`

pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress) {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`unsigned percent = processed * 100 / nr_result;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (percent != last_percent \|\| progress_update) {`
			`fprintf(stderr, "%4u%% (%u/%u) done\r",`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`percent, processed, nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`progress_update = 0;`
			`last_percent = percent;`
			`}`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`if (entry->delta)`
			`/* This happens if we decided to reuse existing`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`* delta from a pack. "!no_reuse_delta &&" is implied.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
			`continue;`

pack-objects: update size heuristucs. We used to omit delta base candidates that is much bigger than the target, but delta size does not grow when we delete more, so that was not a very good heuristics. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-28 04:31:46 +02:00			`if (entry->size < 50)`
			`continue;`
pack-object: slightly more efficient Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-15 19:47:16 +02:00			`free_delta_index(n->index);`
			`n->index = NULL;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`free(n->data);`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`n->data = NULL;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`n->entry = entry;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
Fix delta "sliding window" code When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases. 2005-06-26 03:29:23 +02:00			`j = window;`
			`while (--j > 0) {`
			`unsigned int other_idx = idx + j;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct unpacked *m;`
Fix delta "sliding window" code When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases. 2005-06-26 03:29:23 +02:00			`if (other_idx >= window)`
			`other_idx -= window;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`m = array + other_idx;`
			`if (!m->entry)`
			`break;`
don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-01 04:55:30 +02:00			`if (try_delta(n, m, depth) < 0)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`break;`
			`}`
pack-objects: simplify "thin" pack. There was a misguided logic to overly prefer using objects that we are not going to pack as the base object. This was unnecessary. It does not matter to the unpacking side where the base object is -- it matters more to make the resulting delta smaller. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-05 20:22:57 +01:00			`/* if we made n a delta, and if n is already at max`
			`* depth, leaving it in the window is pointless. we`
			`* should evict it first.`
			`*/`
			`if (entry->delta && depth <= entry->depth)`
			`continue;`
pack-object: slightly more efficient Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-15 19:47:16 +02:00
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`idx++;`
			`if (idx >= window)`
			`idx = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:58 +02:00
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`if (progress)`
			`fputc('\n', stderr);`

use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`for (i = 0; i < window; ++i) {`
pack-object: slightly more efficient Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-15 19:47:16 +02:00			`free_delta_index(array[i].index);`
[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:58 +02:00			`free(array[i].data);`
use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-27 05:58:00 +02:00			`}`
[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:58 +02:00			`free(array);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`static void prepare_pack(int window, int depth)`
			`{`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`get_object_details();`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`sorted_by_type = create_sorted_list(type_size_sort);`
			`if (window && depth)`
			`find_deltas(sorted_by_type, window+1, depth);`
			`}`

pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`static int reuse_cached_pack(unsigned char *sha1)`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`{`
			`static const char cache[] = "pack-cache/pack-%s.%s";`
			`char cached_pack, cached_idx;`
			`int ifd, ofd, ifd_ix = -1;`

			`cached_pack = git_path(cache, sha1_to_hex(sha1), "pack");`
			`ifd = open(cached_pack, O_RDONLY);`
			`if (ifd < 0)`
			`return 0;`

			`if (!pack_to_stdout) {`
			`cached_idx = git_path(cache, sha1_to_hex(sha1), "idx");`
			`ifd_ix = open(cached_idx, O_RDONLY);`
			`if (ifd_ix < 0) {`
			`close(ifd);`
			`return 0;`
			`}`
			`}`

pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (progress)`
			`fprintf(stderr, "Reusing %d objects pack %s\n", nr_objects,`
			`sha1_to_hex(sha1));`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00
			`if (pack_to_stdout) {`
			`if (copy_fd(ifd, 1))`
			`exit(1);`
			`close(ifd);`
			`}`
			`else {`
			`char name[PATH_MAX];`
			`snprintf(name, sizeof(name),`
			`"%s-%s.%s", base_name, sha1_to_hex(sha1), "pack");`
			`ofd = open(name, O_CREAT \| O_EXCL \| O_WRONLY, 0666);`
			`if (ofd < 0)`
			`die("unable to open %s (%s)", name, strerror(errno));`
			`if (copy_fd(ifd, ofd))`
			`exit(1);`
			`close(ifd);`

			`snprintf(name, sizeof(name),`
			`"%s-%s.%s", base_name, sha1_to_hex(sha1), "idx");`
			`ofd = open(name, O_CREAT \| O_EXCL \| O_WRONLY, 0666);`
			`if (ofd < 0)`
			`die("unable to open %s (%s)", name, strerror(errno));`
			`if (copy_fd(ifd_ix, ofd))`
			`exit(1);`
			`close(ifd_ix);`
			`puts(sha1_to_hex(sha1));`
			`}`

			`return 1;`
			`}`

Fix Solaris stdio signal handling stupidities This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 22:28:27 +02:00			`static void setup_progress_signal(void)`
			`{`
			`struct sigaction sa;`
			`struct itimerval v;`

			`memset(&sa, 0, sizeof(sa));`
			`sa.sa_handler = progress_interval;`
			`sigemptyset(&sa.sa_mask);`
			`sa.sa_flags = SA_RESTART;`
			`sigaction(SIGALRM, &sa, NULL);`

			`v.it_interval.tv_sec = 1;`
			`v.it_interval.tv_usec = 0;`
			`v.it_value = v.it_interval;`
			`setitimer(ITIMER_REAL, &v, NULL);`
			`}`

pack-objects: check pack.window for default window size For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-23 07:50:30 +02:00			`static int git_pack_config(const char k, const char v)`
			`{`
			`if(!strcmp(k, "pack.window")) {`
			`window = git_config_int(k, v);`
			`return 0;`
			`}`
			`return git_default_config(k, v);`
			`}`

pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`static void read_object_list_from_stdin(void)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`char line[40 + 1 + PATH_MAX + 2];`
			`unsigned char sha1[20];`
			`unsigned hash;`
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00
pack-objects: be incredibly anal about stdio semantics This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 22:31:54 +02:00			`for (;;) {`
			`if (!fgets(line, sizeof(line), stdin)) {`
			`if (feof(stdin))`
			`break;`
			`if (!ferror(stdin))`
			`die("fgets returned NULL, not EOF, not error!");`
safe_fgets() - even more anal fgets() This is from Linus -- the previous round forgot to clear error after EINTR case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-04 08:41:09 +02:00			`if (errno != EINTR)`
			`die("fgets: %s", strerror(errno));`
			`clearerr(stdin);`
			`continue;`
pack-objects: be incredibly anal about stdio semantics This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 22:31:54 +02:00			`}`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (line[0] == '-') {`
			`if (get_sha1_hex(line+1, sha1))`
			`die("expected edge sha1, got garbage:\n %s",`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`line);`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`add_preferred_base(sha1);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`continue;`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (get_sha1_hex(line, sha1))`
git-repack: Properly abort in corrupt repository In a corrupt repository, git-repack produces a pack that does not contain needed objects without complaining, and the result of this combined with -d flag can be very painful -- e.g. a lossage of one tree object can lead to lossage of blobs reachable only through that tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-21 21:38:31 +01:00			`die("expected sha1, got garbage:\n %s", line);`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00
pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-05 21:03:31 +02:00			`hash = name_hash(line+41);`
Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-06 08:24:57 +02:00			`add_preferred_base_object(line+41, hash);`
			`add_object_entry(sha1, hash, 0);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`}`

			`static void show_commit(struct commit *commit)`
			`{`
			`unsigned hash = name_hash("");`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`add_preferred_base_object("", hash);`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`add_object_entry(commit->object.sha1, hash, 0);`
			`}`

			`static void show_object(struct object_array_entry *p)`
			`{`
			`unsigned hash = name_hash(p->name);`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`add_preferred_base_object(p->name, hash);`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`add_object_entry(p->item->sha1, hash, 0);`
			`}`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`static void show_edge(struct commit *commit)`
			`{`
			`add_preferred_base(commit->object.sha1);`
			`}`

			`static void get_object_list(int ac, const char **av)`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`{`
			`struct rev_info revs;`
			`char line[1000];`
			`int flags = 0;`

			`init_revisions(&revs, NULL);`
			`save_commit_buffer = 0;`
			`track_object_refs = 0;`
			`setup_revisions(ac, av, &revs, NULL);`

			`while (fgets(line, sizeof(line), stdin) != NULL) {`
			`int len = strlen(line);`
			`if (line[len - 1] == '\n')`
			`line[--len] = 0;`
			`if (!len)`
			`break;`
			`if (*line == '-') {`
			`if (!strcmp(line, "--not")) {`
			`flags ^= UNINTERESTING;`
			`continue;`
			`}`
			`die("not a rev '%s'", line);`
			`}`
			`if (handle_revision_arg(line, &revs, flags, 1))`
			`die("bad revision '%s'", line);`
			`}`

			`prepare_revision_walk(&revs);`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`mark_edges_uninteresting(revs.commits, &revs, show_edge);`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`traverse_commit_list(&revs, show_commit, show_object);`
			`}`

			`int cmd_pack_objects(int argc, const char *argv, const char prefix)`
			`{`
			`SHA_CTX ctx;`
			`int depth = 10;`
			`struct object_entry **list;`
			`int use_internal_rev_list = 0;`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`int thin = 0;`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`int i;`
Allow arbitrary number of arguments to git-pack-objects If a repository ever gets in a situation where there are too many packs (more than 60 or so), perhaps because of frequent use of git-fetch -k or incremental git-repack, then it becomes impossible to fully repack the repository with git-repack -a. That command just dies with the cryptic message fatal: too many internal rev-list options This message comes from git-pack-objects, which is passed one command line option like --unpacked=pack-<SHA1>.pack for each pack file to be repacked. However, the current code has a static limit of 64 command line arguments and just aborts if more arguments are passed to it. Fix this by dynamically allocating the array of command line arguments, and doubling the size each time it overflows. Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-25 18:34:27 +01:00			`const char **rp_av;`
			`int rp_ac_alloc = 64;`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`int rp_ac;`

Allow arbitrary number of arguments to git-pack-objects If a repository ever gets in a situation where there are too many packs (more than 60 or so), perhaps because of frequent use of git-fetch -k or incremental git-repack, then it becomes impossible to fully repack the repository with git-repack -a. That command just dies with the cryptic message fatal: too many internal rev-list options This message comes from git-pack-objects, which is passed one command line option like --unpacked=pack-<SHA1>.pack for each pack file to be repacked. However, the current code has a static limit of 64 command line arguments and just aborts if more arguments are passed to it. Fix this by dynamically allocating the array of command line arguments, and doubling the size each time it overflows. Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-25 18:34:27 +01:00			`rp_av = xcalloc(rp_ac_alloc, sizeof(*rp_av));`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`rp_av[0] = "pack-objects";`
			`rp_av[1] = "--objects"; /* --thin will make it --objects-edge */`
			`rp_ac = 2;`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00
			`git_config(git_pack_config);`

			`progress = isatty(2);`
			`for (i = 1; i < argc; i++) {`
			`const char *arg = argv[i];`

			`if (*arg != '-')`
			`break;`

			`if (!strcmp("--non-empty", arg)) {`
			`non_empty = 1;`
			`continue;`
			`}`
			`if (!strcmp("--local", arg)) {`
			`local = 1;`
			`continue;`
			`}`
			`if (!strcmp("--incremental", arg)) {`
			`incremental = 1;`
			`continue;`
			`}`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`if (!prefixcmp(arg, "--window=")) {`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`char *end;`
			`window = strtoul(arg+9, &end, 0);`
			`if (!arg[9] \|\| *end)`
			`usage(pack_usage);`
			`continue;`
			`}`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`if (!prefixcmp(arg, "--depth=")) {`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`char *end;`
			`depth = strtoul(arg+8, &end, 0);`
			`if (!arg[8] \|\| *end)`
			`usage(pack_usage);`
			`continue;`
			`}`
			`if (!strcmp("--progress", arg)) {`
			`progress = 1;`
			`continue;`
			`}`
git-pack-objects progress flag documentation and cleanup This adds documentation for --progress and --all-progress, remove a duplicate --progress handling and make usage string more readable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 16:51:23 +01:00			`if (!strcmp("--all-progress", arg)) {`
			`progress = 2;`
			`continue;`
			`}`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`if (!strcmp("-q", arg)) {`
			`progress = 0;`
			`continue;`
			`}`
			`if (!strcmp("--no-reuse-delta", arg)) {`
			`no_reuse_delta = 1;`
			`continue;`
			`}`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`if (!strcmp("--delta-base-offset", arg)) {`
make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-23 03:25:04 +02:00			`allow_ofs_delta = 1;`
make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:09:44 +02:00			`continue;`
			`}`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`if (!strcmp("--stdout", arg)) {`
			`pack_to_stdout = 1;`
			`continue;`
			`}`
			`if (!strcmp("--revs", arg)) {`
			`use_internal_rev_list = 1;`
			`continue;`
			`}`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`if (!strcmp("--unpacked", arg) \|\|`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`!prefixcmp(arg, "--unpacked=") \|\|`
Teach git-repack to preserve objects referred to by reflog entries. This adds a new option --reflog to pack-objects and revision machinery; do not bother documenting it for now, since this is only useful for local repacking. When the option is passed, objects reachable from reflog entries are marked as interesting while computing the set of objects to pack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-19 02:25:28 +01:00			`!strcmp("--reflog", arg) \|\|`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`!strcmp("--all", arg)) {`
			`use_internal_rev_list = 1;`
Allow arbitrary number of arguments to git-pack-objects If a repository ever gets in a situation where there are too many packs (more than 60 or so), perhaps because of frequent use of git-fetch -k or incremental git-repack, then it becomes impossible to fully repack the repository with git-repack -a. That command just dies with the cryptic message fatal: too many internal rev-list options This message comes from git-pack-objects, which is passed one command line option like --unpacked=pack-<SHA1>.pack for each pack file to be repacked. However, the current code has a static limit of 64 command line arguments and just aborts if more arguments are passed to it. Fix this by dynamically allocating the array of command line arguments, and doubling the size each time it overflows. Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-25 18:34:27 +01:00			`if (rp_ac >= rp_ac_alloc - 1) {`
			`rp_ac_alloc = alloc_nr(rp_ac_alloc);`
			`rp_av = xrealloc(rp_av,`
			`rp_ac_alloc * sizeof(*rp_av));`
			`}`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`rp_av[rp_ac++] = arg;`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`continue;`
			`}`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`if (!strcmp("--thin", arg)) {`
			`use_internal_rev_list = 1;`
			`thin = 1;`
			`rp_av[1] = "--objects-edge";`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00			`continue;`
			`}`
			`usage(pack_usage);`
			`}`

			`/* Traditionally "pack-objects [options] base extra" failed;`
			`* we would however want to take refs parameter that would`
			`* have been given to upstream rev-list ourselves, which means`
			`* we somehow want to say what the base name is. So the`
			`* syntax would be:`
			`*`
			`* pack-objects [options] base <refs...>`
			`*`
			`* in other words, we would treat the first non-option as the`
			`* base_name and send everything else to the internal revision`
			`* walker.`
			`*/`

			`if (!pack_to_stdout)`
			`base_name = argv[i++];`

			`if (pack_to_stdout != !base_name)`
			`usage(pack_usage);`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`if (!pack_to_stdout && thin)`
			`die("--thin cannot be used to build an indexable pack.");`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00
			`prepare_packed_git();`

			`if (progress) {`
			`fprintf(stderr, "Generating pack...\n");`
			`setup_progress_signal();`
			`}`

			`if (!use_internal_rev_list)`
			`read_object_list_from_stdin();`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`else {`
			`rp_av[rp_ac] = NULL;`
			`get_object_list(rp_ac, rp_av);`
			`}`
pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 08:47:39 +02:00
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`if (progress)`
			`fprintf(stderr, "Done counting %d objects.\n", nr_objects);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`sorted_by_sha = create_final_object_list();`
			`if (non_empty && !nr_result)`
Add "--non-empty" flag to git-pack-objects It skips writing the pack-file if it ends up being empty. 2005-07-03 22:36:58 +02:00			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00			`SHA1_Init(&ctx);`
			`list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`for (i = 0; i < nr_result; i++) {`
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00			`struct object_entry entry = list++;`
			`SHA1_Update(&ctx, entry->sha1, 20);`
			`}`
			`SHA1_Final(object_list_sha1, &ctx);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (progress && (nr_objects != nr_result))`
			`fprintf(stderr, "Result has %d objects.\n", nr_result);`
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00
pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-02 00:05:12 +02:00			`if (reuse_cached_pack(object_list_sha1))`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`;`
			`else {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (nr_result)`
			`prepare_pack(window, depth);`
make git-push a bit more verbose Currently git-push displays progress status for the local packing of objects to send, but nothing once it starts to push it over the connection. Having progress status in that later case is especially nice when pushing lots of objects over a slow network link. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 22:58:32 +01:00			`if (progress == 1 && pack_to_stdout) {`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`/* the other end usually displays progress itself */`
			`struct itimerval v = {{0,},};`
			`setitimer(ITIMER_REAL, &v, NULL);`
			`signal(SIGALRM, SIG_IGN );`
			`progress_update = 0;`
			`}`
			`write_pack_file();`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`if (!pack_to_stdout) {`
			`write_index_file();`
			`puts(sha1_to_hex(object_list_sha1));`
			`}`
Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes. 2005-07-04 00:34:04 +02:00			`}`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (progress)`
pack-objects: remove redundent status information The final 'nr_result' and 'written' values must always be the same otherwise we're in deep trouble. So let's remove a redundent report. And for paranoia sake let's make sure those two variables are actually equal after all objects are written (one never knows). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-29 23:15:48 +01:00			`fprintf(stderr, "Total %d (delta %d), reused %d (delta %d)\n",`
			`written, written_delta, reused, reused_delta);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return 0;`
			`}`