mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00

1171 lines

29 KiB

C

Raw Normal View History

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`#include "cache.h"`
			`#include "object.h"`
			`#include "delta.h"`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`#include "pack.h"`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`#include "csum-file.h"`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`#include "diff.h"`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`#include <sys/time.h>`
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`#include <signal.h>`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`static const char pack_usage[] = "git-pack-objects [-q] [--no-reuse-delta] [--non-empty] [--local] [--incremental] [--window=N] [--depth=N] {--stdout \| base-name} < object-list";`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`struct object_entry {`
			`unsigned char sha1[20];`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned long size; /* uncompressed size */`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`unsigned long offset; /* offset into the final pack file;`
			`* nonzero if already written.`
			`*/`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned int depth; /* delta depth */`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`unsigned int delta_limit; /* base adjustment for in-pack delta */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned int hash; /* name hint hash */`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`enum object_type type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`enum object_type in_pack_type; /* could be delta */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`unsigned long delta_size; /* delta data size (uncompressed) */`
			`struct object_entry delta; / delta base object */`
			`struct packed_git in_pack; / already in pack */`
			`unsigned int in_pack_offset;`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`struct object_entry delta_child; / delitified objects who bases me */`
			`struct object_entry delta_sibling; / other deltified objects who`
			`* uses the same base as me`
			`*/`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`int preferred_base; /* we do not pack this, but is encouraged to`
			`* be used as the base objectto delta huge`
			`* objects against.`
			`*/`
			`int based_on_preferred; /* current delta candidate is a preferred`
			`* one, or delta against a preferred one.`
			`*/`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`};`

pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/*`
			`* Objects we are going to pack are colected in objects array (dynamically`
			`* expanded). nr_objects & nr_alloc controls this array. They are stored`
			`* in the order we see -- typically rev-list --objects order that gives us`
			`* nice "minimum seek" order.`
			`*`
			`* sorted-by-sha ans sorted-by-type are arrays of pointers that point at`
			`* elements in the objects array. The former is used to build the pack`
			`* index (lists object names in the ascending order to help offset lookup),`
			`* and the latter is used to group similar things together by try_delta()`
			`* heuristics.`
			`*/`

Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes. 2005-07-04 00:34:04 +02:00			`static unsigned char object_list_sha1[20];`
Add "--non-empty" flag to git-pack-objects It skips writing the pack-file if it ends up being empty. 2005-07-03 22:36:58 +02:00			`static int non_empty = 0;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`static int no_reuse_delta = 0;`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`static int local = 0;`
Add "--incremental" flag to git-pack-objects It won't add an object that is already in a pack to the new pack. 2005-07-03 22:08:40 +02:00			`static int incremental = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static struct object_entry sorted_by_sha, sorted_by_type;`
			`static struct object_entry *objects = NULL;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static int nr_objects = 0, nr_alloc = 0, nr_result = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static const char *base_name;`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`static unsigned char pack_file_sha1[20];`
Make pack-objects chattier. You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 22:01:54 +01:00			`static int progress = 1;`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`static volatile int progress_update = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/*`
			`* The object names in objects array are hashed with this hashtable,`
			`* to help looking up the entry by object name. Binary search from`
			`* sorted_by_sha is also possible but this was easier to code and faster.`
			`* This hashtable is built after all the objects are seen.`
			`*/`
			`static int *object_ix = NULL;`
			`static int object_ix_hashsz = 0;`

			`/*`
			`* Pack index for existing packs give us easy access to the offsets into`
			`* corresponding pack file where each object's data starts, but the entries`
			`* do not store the size of the compressed representation (uncompressed`
			`* size is easily available by examining the pack entry header). We build`
			`* a hashtable of existing packs (pack_revindex), and keep reverse index`
			`* here -- pack index file is sorted by object name mapping to offset; this`
			`* pack_revindex[].revindex array is an ordered list of offsets, so if you`
			`* know the offset of an object, next offset is where its packed`
			`* representation ends.`
			`*/`
			`struct pack_revindex {`
			`struct packed_git *p;`
			`unsigned long *revindex;`
			`} *pack_revindex = NULL;`
			`static int pack_revindex_hashsz = 0;`

			`/*`
			`* stats`
			`*/`
			`static int written = 0;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`static int written_delta = 0;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`static int reused = 0;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`static int reused_delta = 0;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`static int pack_revindex_ix(struct packed_git *p)`
			`{`
Re-fix compilation warnings. Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-02 00:01:53 +01:00			`unsigned long ui = (unsigned long)p;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`int i;`

			`ui = ui ^ (ui >> 16); /* defeat structure alignment */`
			`i = (int)(ui % pack_revindex_hashsz);`
			`while (pack_revindex[i].p) {`
			`if (pack_revindex[i].p == p)`
			`return i;`
			`if (++i == pack_revindex_hashsz)`
			`i = 0;`
			`}`
			`return -1 - i;`
			`}`

			`static void prepare_pack_ix(void)`
			`{`
			`int num;`
			`struct packed_git *p;`
			`for (num = 0, p = packed_git; p; p = p->next)`
			`num++;`
			`if (!num)`
			`return;`
			`pack_revindex_hashsz = num * 11;`
			`pack_revindex = xcalloc(sizeof(*pack_revindex), pack_revindex_hashsz);`
			`for (p = packed_git; p; p = p->next) {`
			`num = pack_revindex_ix(p);`
			`num = - 1 - num;`
			`pack_revindex[num].p = p;`
			`}`
			`/* revindex elements are lazily initialized */`
			`}`

			`static int cmp_offset(const void a_, const void b_)`
			`{`
			`unsigned long a = (unsigned long ) a_;`
			`unsigned long b = (unsigned long ) b_;`
			`if (a < b)`
			`return -1;`
			`else if (a == b)`
			`return 0;`
			`else`
			`return 1;`
			`}`

			`/*`
			`* Ordered list of offsets of objects in the pack.`
			`*/`
			`static void prepare_pack_revindex(struct pack_revindex *rix)`
			`{`
			`struct packed_git *p = rix->p;`
			`int num_ent = num_packed_objects(p);`
			`int i;`
			`void *index = p->index_base + 256;`

			`rix->revindex = xmalloc(sizeof(unsigned long) * (num_ent + 1));`
			`for (i = 0; i < num_ent; i++) {`
			`long hl = ((long )(index + 24 * i));`
			`rix->revindex[i] = ntohl(hl);`
			`}`
			`/* This knows the pack format -- the 20-byte trailer`
			`* follows immediately after the last object data.`
			`*/`
			`rix->revindex[num_ent] = p->pack_size - 20;`
			`qsort(rix->revindex, num_ent, sizeof(unsigned long), cmp_offset);`
			`}`

			`static unsigned long find_packed_object_size(struct packed_git *p,`
			`unsigned long ofs)`
			`{`
			`int num;`
			`int lo, hi;`
			`struct pack_revindex *rix;`
			`unsigned long *revindex;`
			`num = pack_revindex_ix(p);`
			`if (num < 0)`
			`die("internal error: pack revindex uninitialized");`
			`rix = &pack_revindex[num];`
			`if (!rix->revindex)`
			`prepare_pack_revindex(rix);`
			`revindex = rix->revindex;`
			`lo = 0;`
			`hi = num_packed_objects(p) + 1;`
			`do {`
			`int mi = (lo + hi) / 2;`
			`if (revindex[mi] == ofs) {`
			`return revindex[mi+1] - ofs;`
			`}`
			`else if (ofs < revindex[mi])`
			`hi = mi;`
			`else`
			`lo = mi + 1;`
			`} while (lo < hi);`
			`die("internal error: pack revindex corrupt");`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void delta_against(void buf, unsigned long size, struct object_entry *entry)`
			`{`
			`unsigned long othersize, delta_size;`
			`char type[10];`
			`void *otherbuf = read_sha1_file(entry->delta->sha1, type, &othersize);`
			`void *delta_buf;`

			`if (!otherbuf)`
			`die("unable to read %s", sha1_to_hex(entry->delta->sha1));`
[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair. This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-26 13:29:18 +02:00			`delta_buf = diff_delta(otherbuf, othersize,`
[PATCH] assorted delta code cleanup This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 08:49:56 +02:00			`buf, size, &delta_size, 0);`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`if (!delta_buf \|\| delta_size != entry->delta_size)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`die("delta size changed");`
			`free(buf);`
			`free(otherbuf);`
			`return delta_buf;`
			`}`

Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`/*`
			`* The per-object header is a pretty dense thing, which is`
			`* - first byte: low four bits are "size", then three bits of "type",`
			`* and the high bit is "size continues".`
			`* - each byte afterwards: low seven bits are size continuation,`
			`* with the high bit being "size continues"`
			`*/`
			`static int encode_header(enum object_type type, unsigned long size, unsigned char *hdr)`
			`{`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`int n = 1;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned char c;`

			`if (type < OBJ_COMMIT \|\| type > OBJ_DELTA)`
			`die("bad type %d", type);`

Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`c = (type << 4) \| (size & 15);`
			`size >>= 4;`
			`while (size) {`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`*hdr++ = c \| 0x80;`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`c = size & 0x7f;`
			`size >>= 7;`
			`n++;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`}`
			`*hdr = c;`
			`return n;`
			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static unsigned long write_object(struct sha1file *f,`
			`struct object_entry *entry)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
			`unsigned long size;`
			`char type[10];`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`void *buf;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned char header[10];`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`unsigned hdrlen, datalen;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`enum object_type obj_type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`int to_reuse = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (entry->preferred_base)`
			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`obj_type = entry->type;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (! entry->in_pack)`
			`to_reuse = 0; /* can't reuse what we don't have */`
			`else if (obj_type == OBJ_DELTA)`
			`to_reuse = 1; /* check_object() decided it for us */`
			`else if (obj_type != entry->in_pack_type)`
			`to_reuse = 0; /* pack has delta which is unusable */`
			`else if (entry->delta)`
			`to_reuse = 0; /* we want to pack afresh */`
			`else`
			`to_reuse = 1; /* we have it in-pack undeltified,`
			`* and we do not need to deltify it.`
			`*/`

			`if (! to_reuse) {`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`buf = read_sha1_file(entry->sha1, type, &size);`
			`if (!buf)`
			`die("unable to read %s", sha1_to_hex(entry->sha1));`
			`if (size != entry->size)`
			`die("object %s size inconsistency (%lu vs %lu)",`
			`sha1_to_hex(entry->sha1), size, entry->size);`
			`if (entry->delta) {`
			`buf = delta_against(buf, size, entry);`
			`size = entry->delta_size;`
			`obj_type = OBJ_DELTA;`
			`}`
			`/*`
			`* The object header is a byte of 'type' followed by zero or`
			`* more bytes of length. For deltas, the 20 bytes of delta`
			`* sha1 follows that.`
			`*/`
			`hdrlen = encode_header(obj_type, size, header);`
			`sha1write(f, header, hdrlen);`

			`if (entry->delta) {`
			`sha1write(f, entry->delta, 20);`
			`hdrlen += 20;`
			`}`
			`datalen = sha1write_compressed(f, buf, size);`
			`free(buf);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`else {`
			`struct packed_git *p = entry->in_pack;`
			`use_packed_git(p);`

			`datalen = find_packed_object_size(p, entry->in_pack_offset);`
			`buf = p->pack_base + entry->in_pack_offset;`
			`sha1write(f, buf, datalen);`
			`unuse_packed_git(p);`
			`hdrlen = 0; /* not really */`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (obj_type == OBJ_DELTA)`
			`reused_delta++;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`reused++;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`}`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (obj_type == OBJ_DELTA)`
			`written_delta++;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`written++;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return hdrlen + datalen;`
			`}`

[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`static unsigned long write_one(struct sha1file *f,`
			`struct object_entry *e,`
			`unsigned long offset)`
			`{`
			`if (e->offset)`
			`/* offset starts from header size and cannot be zero`
			`* if it is written already.`
			`*/`
			`return offset;`
			`e->offset = offset;`
			`offset += write_object(f, e);`
code comments: spell Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-29 10:30:08 +01:00			`/* if we are deltified, write out its base object. */`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`if (e->delta)`
			`offset = write_one(f, e->delta, offset);`
			`return offset;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void write_pack_file(void)`
			`{`
			`int i;`
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`struct sha1file *f;`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`unsigned long offset;`
			`struct pack_header hdr;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`unsigned last_percent = 999;`
			`int do_progress = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`if (!base_name)`
			`f = sha1fd(1, "<stdout>");`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`else {`
			`f = sha1create("%s-%s.%s", base_name,`
			`sha1_to_hex(object_list_sha1), "pack");`
			`do_progress = progress;`
			`}`
			`if (do_progress)`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`fprintf(stderr, "Writing %d objects.\n", nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`hdr.hdr_signature = htonl(PACK_SIGNATURE);`
Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format. 2005-06-29 07:15:57 +02:00			`hdr.hdr_version = htonl(PACK_VERSION);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`hdr.hdr_entries = htonl(nr_result);`
Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently. 2005-06-28 23:21:02 +02:00			`sha1write(f, &hdr, sizeof(hdr));`
			`offset = sizeof(hdr);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (!nr_result)`
			`goto done;`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`for (i = 0; i < nr_objects; i++) {`
[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 02:49:27 +02:00			`offset = write_one(f, objects + i, offset);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (do_progress) {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`unsigned percent = written * 100 / nr_result;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress_update \|\| percent != last_percent) {`
			`fprintf(stderr, "%4u%% (%u/%u) done\r",`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`percent, written, nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`progress_update = 0;`
			`last_percent = percent;`
			`}`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`}`
			`}`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (do_progress)`
			`fputc('\n', stderr);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`done:`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`sha1close(f, pack_file_sha1, 1);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

			`static void write_index_file(void)`
			`{`
			`int i;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct sha1file *f = sha1create("%s-%s.%s", base_name,`
			`sha1_to_hex(object_list_sha1), "idx");`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry **list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct object_entry **last = list + nr_result;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`unsigned int array[256];`

			`/*`
			`* Write the first-level table (the list is sorted,`
			`* but we use a 256-entry lookup to be able to avoid`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`* having to do eight extra binary search iterations).`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`*/`
			`for (i = 0; i < 256; i++) {`
			`struct object_entry **next = list;`
			`while (next < last) {`
			`struct object_entry entry = next;`
			`if (entry->sha1[0] != i)`
			`break;`
			`next++;`
			`}`
			`array[i] = htonl(next - sorted_by_sha);`
			`list = next;`
			`}`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`sha1write(f, array, 256 * sizeof(int));`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`/*`
			`* Write the actual SHA1 entries..`
			`*/`
			`list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`for (i = 0; i < nr_result; i++) {`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry entry = list++;`
			`unsigned int offset = htonl(entry->offset);`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`sha1write(f, &offset, 4);`
			`sha1write(f, entry->sha1, 20);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`sha1write(f, pack_file_sha1, 20);`
			`sha1close(f, NULL, 1);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static int locate_object_entry_hash(const unsigned char *sha1)`
			`{`
			`int i;`
			`unsigned int ui;`
			`memcpy(&ui, sha1, sizeof(unsigned int));`
			`i = ui % object_ix_hashsz;`
			`while (0 < object_ix[i]) {`
			`if (!memcmp(sha1, objects[object_ix[i]-1].sha1, 20))`
			`return i;`
			`if (++i == object_ix_hashsz)`
			`i = 0;`
			`}`
			`return -1 - i;`
			`}`

			`static struct object_entry locate_object_entry(const unsigned char sha1)`
			`{`
			`int i;`

			`if (!object_ix_hashsz)`
			`return NULL;`

			`i = locate_object_entry_hash(sha1);`
			`if (0 <= i)`
			`return &objects[object_ix[i]-1];`
			`return NULL;`
			`}`

			`static void rehash_objects(void)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`int i;`
			`struct object_entry *oe;`

			`object_ix_hashsz = nr_objects * 3;`
			`if (object_ix_hashsz < 1024)`
			`object_ix_hashsz = 1024;`
			`object_ix = xrealloc(object_ix, sizeof(int) * object_ix_hashsz);`
			`object_ix = memset(object_ix, 0, sizeof(int) * object_ix_hashsz);`
			`for (i = 0, oe = objects; i < nr_objects; i++, oe++) {`
			`int ix = locate_object_entry_hash(oe->sha1);`
			`if (0 <= ix)`
			`continue;`
			`ix = -1 - ix;`
			`object_ix[ix] = i + 1;`
			`}`
			`}`

pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`struct name_path {`
			`struct name_path *up;`
			`const char *elem;`
			`int len;`
			`};`

pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`#define DIRBITS 12`

pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`static unsigned name_hash(struct name_path path, const char name)`
			`{`
			`struct name_path *p = path;`
			`const char *n = name + strlen(name);`
pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`unsigned hash = 0, name_hash = 0, name_done = 0;`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00
			`if (n != name && n[-1] == '\n')`
			`n--;`
			`while (name <= --n) {`
			`unsigned char c = *n;`
pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`if (c == '/' && !name_done) {`
			`name_hash = hash;`
			`name_done = 1;`
			`hash = 0;`
			`}`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`hash = hash * 11 + c;`
			`}`
pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`if (!name_done) {`
			`name_hash = hash;`
			`hash = 0;`
			`}`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`for (p = path; p; p = p->up) {`
			`hash = hash * 11 + '/';`
			`n = p->elem + p->len;`
			`while (p->elem <= --n) {`
			`unsigned char c = *n;`
			`hash = hash * 11 + c;`
			`}`
			`}`
pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:27:49 +01:00			`/*`
			`* Make sure "Makefile" and "t/Makefile" are hashed separately`
			`* but close enough.`
			`*/`
			`hash = (name_hash<<DIRBITS) \| (hash & ((1U<<DIRBITS )-1));`

			`if (0) { /* debug */`
			`n = name + strlen(name);`
			`if (n != name && n[-1] == '\n')`
			`n--;`
			`while (name <= --n)`
			`fputc(*n, stderr);`
			`for (p = path; p; p = p->up) {`
			`fputc('/', stderr);`
			`n = p->elem + p->len;`
			`while (p->elem <= --n)`
			`fputc(*n, stderr);`
			`}`
			`fprintf(stderr, "\t%08x\n", hash);`
			`}`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`return hash;`
			`}`

			`static int add_object_entry(const unsigned char *sha1, unsigned hash, int exclude)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
			`unsigned int idx = nr_objects;`
			`struct object_entry *entry;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`struct packed_git *p;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`unsigned int found_offset = 0;`
			`struct packed_git *found_pack = NULL;`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`int ix, status = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (!exclude) {`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`for (p = packed_git; p; p = p->next) {`
			`struct pack_entry e;`
			`if (find_pack_entry_one(sha1, &e, p)) {`
			`if (incremental)`
			`return 0;`
			`if (local && !p->pack_local)`
			`return 0;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (!found_pack) {`
			`found_offset = e.offset;`
			`found_pack = e.p;`
			`}`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`}`
			`}`
			`}`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if ((entry = locate_object_entry(sha1)) != NULL)`
			`goto already_added;`
Add "--incremental" flag to git-pack-objects It won't add an object that is already in a pack to the new pack. 2005-07-03 22:08:40 +02:00
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (idx >= nr_alloc) {`
			`unsigned int needed = (idx + 1024) * 3 / 2;`
			`objects = xrealloc(objects, needed * sizeof(*entry));`
			`nr_alloc = needed;`
			`}`
			`entry = objects + idx;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`nr_objects = idx + 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`memset(entry, 0, sizeof(*entry));`
			`memcpy(entry->sha1, sha1, 20);`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`entry->hash = hash;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
			`if (object_ix_hashsz * 3 <= nr_objects * 4)`
			`rehash_objects();`
			`else {`
			`ix = locate_object_entry_hash(entry->sha1);`
			`if (0 <= ix)`
			`die("internal error in object hashing.");`
			`object_ix[-1 - ix] = idx + 1;`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`status = 1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
			`already_added:`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (progress_update) {`
			`fprintf(stderr, "Counting objects...%d\r", nr_objects);`
			`progress_update = 0;`
			`}`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (exclude)`
			`entry->preferred_base = 1;`
			`else {`
			`if (found_pack) {`
			`entry->in_pack = found_pack;`
			`entry->in_pack_offset = found_offset;`
			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`return status;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`static void add_pbase_tree(struct tree_desc tree, struct name_path up)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`{`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`while (tree->size) {`
			`const unsigned char *sha1;`
			`const char *name;`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`unsigned mode, hash;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`unsigned long size;`
			`char type[20];`

			`sha1 = tree_entry_extract(tree, &name, &mode);`
			`update_tree_entry(tree);`
			`if (!has_sha1_file(sha1))`
			`continue;`
			`if (sha1_object_info(sha1, type, &size))`
			`continue;`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`hash = name_hash(up, name);`
			`if (!add_object_entry(sha1, hash, 1))`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`continue;`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (!strcmp(type, "tree")) {`
			`struct tree_desc sub;`
			`void *elem;`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`struct name_path me;`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`elem = read_sha1_file(sha1, type, &sub.size);`
			`sub.buf = elem;`
			`if (sub.buf) {`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`me.up = up;`
			`me.elem = name;`
			`me.len = strlen(name);`
			`add_pbase_tree(&sub, &me);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`free(elem);`
			`}`
			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`
			`}`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`static void add_preferred_base(unsigned char *sha1)`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`{`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`struct tree_desc tree;`
			`void *elem;`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`elem = read_object_with_reference(sha1, "tree", &tree.size, NULL);`
			`tree.buf = elem;`
			`if (!tree.buf)`
			`return;`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`if (add_object_entry(sha1, name_hash(NULL, ""), 1))`
			`add_pbase_tree(&tree, NULL);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`free(elem);`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void check_object(struct object_entry *entry)`
			`{`
[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:34:06 +02:00			`char type[20];`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (entry->in_pack && !entry->preferred_base) {`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`unsigned char base[20];`
			`unsigned long size;`
			`struct object_entry *base_entry;`

			`/* We want in_pack_type even if we do not reuse delta.`
			`* There is no point not reusing non-delta representations.`
			`*/`
			`check_reuse_pack_delta(entry->in_pack,`
			`entry->in_pack_offset,`
			`base, &size,`
			`&entry->in_pack_type);`

pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`/* Check if it is delta, and the base is also an object`
			`* we are going to pack. If so we will reuse the existing`
			`* delta.`
			`*/`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (!no_reuse_delta &&`
			`entry->in_pack_type == OBJ_DELTA &&`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`(base_entry = locate_object_entry(base)) &&`
			`(!base_entry->preferred_base)) {`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
			`/* Depth value does not matter - find_deltas()`
			`* will never consider reused delta as the`
			`* base object to deltify other objects`
			`* against, in order to avoid circular deltas.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
			`/* uncompressed size of the delta data */`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`entry->size = entry->delta_size = size;`
			`entry->delta = base_entry;`
			`entry->type = OBJ_DELTA;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`entry->delta_sibling = base_entry->delta_child;`
			`base_entry->delta_child = entry;`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`return;`
			`}`
			`/* Otherwise we would do the usual */`
[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:34:06 +02:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`if (sha1_object_info(entry->sha1, type, &entry->size))`
[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:34:06 +02:00			`die("unable to get type of object %s",`
			`sha1_to_hex(entry->sha1));`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`if (!strcmp(type, "commit")) {`
			`entry->type = OBJ_COMMIT;`
			`} else if (!strcmp(type, "tree")) {`
			`entry->type = OBJ_TREE;`
			`} else if (!strcmp(type, "blob")) {`
			`entry->type = OBJ_BLOB;`
			`} else if (!strcmp(type, "tag")) {`
			`entry->type = OBJ_TAG;`
			`} else`
			`die("unable to pack object %s of type %s",`
			`sha1_to_hex(entry->sha1), type);`
			`}`

pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)`
			`{`
			`struct object_entry *child = me->delta_child;`
			`unsigned int m = n;`
			`while (child) {`
			`unsigned int c = check_delta_limit(child, n + 1);`
			`if (m < c)`
			`m = c;`
			`child = child->delta_sibling;`
			`}`
			`return m;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static void get_object_details(void)`
			`{`
			`int i;`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`struct object_entry *entry;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`prepare_pack_ix();`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`for (i = 0, entry = objects; i < nr_objects; i++, entry++)`
			`check_object(entry);`
pack-objects: allow "thin" packs to exceed depth limits When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:04:52 +01:00
			`if (nr_objects == nr_result) {`
			`/*`
			`* Depth of objects that depend on the entry -- this`
			`* is subtracted from depth-max to break too deep`
			`* delta chain because of delta data reusing.`
			`* However, we loosen this restriction when we know we`
			`* are creating a thin pack -- it will have to be`
			`* expanded on the other end anyway, so do not`
			`* artificially cut the delta chain and let it go as`
			`* deep as it wants.`
			`*/`
			`for (i = 0, entry = objects; i < nr_objects; i++, entry++)`
			`if (!entry->delta && entry->delta_child)`
			`entry->delta_limit =`
			`check_delta_limit(entry, 1);`
			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

			`typedef int (entry_sort_t)(const struct object_entry , const struct object_entry *);`

			`static entry_sort_t current_sort;`

			`static int sort_comparator(const void _a, const void _b)`
			`{`
			`struct object_entry a = (struct object_entry **)_a;`
			`struct object_entry b = (struct object_entry **)_b;`
			`return current_sort(a,b);`
			`}`

			`static struct object_entry **create_sorted_list(entry_sort_t sort)`
			`{`
			`struct object_entry *list = xmalloc(nr_objects sizeof(struct object_entry *));`
			`int i;`

			`for (i = 0; i < nr_objects; i++)`
			`list[i] = objects + i;`
			`current_sort = sort;`
			`qsort(list, nr_objects, sizeof(struct object_entry *), sort_comparator);`
			`return list;`
			`}`

			`static int sha1_sort(const struct object_entry a, const struct object_entry b)`
			`{`
			`return memcmp(a->sha1, b->sha1, 20);`
			`}`

Use setenv(), fix warnings - Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 16:13:46 +01:00			`static struct object_entry **create_final_object_list(void)`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`{`
			`struct object_entry **list;`
			`int i, j;`

			`for (i = nr_result = 0; i < nr_objects; i++)`
			`if (!objects[i].preferred_base)`
			`nr_result++;`
			`list = xmalloc(nr_result * sizeof(struct object_entry *));`
			`for (i = j = 0; i < nr_objects; i++) {`
			`if (!objects[i].preferred_base)`
			`list[j++] = objects + i;`
			`}`
			`current_sort = sha1_sort;`
			`qsort(list, nr_result, sizeof(struct object_entry *), sort_comparator);`
			`return list;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`static int type_size_sort(const struct object_entry a, const struct object_entry b)`
			`{`
			`if (a->type < b->type)`
			`return -1;`
			`if (a->type > b->type)`
			`return 1;`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`if (a->hash < b->hash)`
			`return -1;`
			`if (a->hash > b->hash)`
			`return 1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (a->preferred_base < b->preferred_base)`
			`return -1;`
			`if (a->preferred_base > b->preferred_base)`
			`return 1;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (a->size < b->size)`
			`return -1;`
			`if (a->size > b->size)`
			`return 1;`
			`return a < b ? -1 : (a > b);`
			`}`

			`struct unpacked {`
			`struct object_entry *entry;`
			`void *data;`
			`};`

			`/*`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`* We search for deltas _backwards_ in a list sorted by type and`
			`* by size, so that we see progressively smaller and smaller files.`
			`* That's because we prefer deltas to be from the bigger file`
			`* to the smaller - deletes are potentially cheaper, but perhaps`
			`* more importantly, the bigger file is likely the more recent`
			`* one.`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`*/`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`static int try_delta(struct unpacked cur, struct unpacked old, unsigned max_depth)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
			`struct object_entry *cur_entry = cur->entry;`
			`struct object_entry *old_entry = old->entry;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`int old_preferred = (old_entry->preferred_base \|\|`
			`old_entry->based_on_preferred);`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`unsigned long size, oldsize, delta_size, sizediff;`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`long max_size;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`void *delta_buf;`

			`/* Don't bother doing diffs between different types */`
			`if (cur_entry->type != old_entry->type)`
			`return -1;`

Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`/* We do not compute delta to create objects we are not`
			`* going to pack.`
			`*/`
			`if (cur_entry->preferred_base)`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`return -1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00
			`/* If the current object is at pack edge, take the depth the`
			`* objects that depend on the current object into account --`
			`* otherwise they would become too deep.`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`*/`
pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-18 05:58:45 +01:00			`if (cur_entry->delta_child) {`
			`if (max_depth <= cur_entry->delta_limit)`
			`return 0;`
			`max_depth -= cur_entry->delta_limit;`
			`}`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`size = cur_entry->size;`
			`oldsize = old_entry->size;`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`sizediff = oldsize > size ? oldsize - size : size - oldsize;`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00
			`if (size < 50)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return -1;`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`if (old_entry->depth >= max_depth)`
			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`/*`
			`* NOTE!`
			`*`
			`* We always delta from the bigger to the smaller, since that's`
			`* more space-efficient (deletes don't have to say _what_ they`
			`* delete).`
			`*/`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`max_size = size / 2 - 20;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (cur_entry->delta) {`
			`if (cur_entry->based_on_preferred) {`
			`if (old_preferred)`
			`max_size = cur_entry->delta_size-1;`
			`else`
			`/* trying with non-preferred one when we`
			`* already have a delta based on preferred`
			`* one is pointless.`
			`*/`
pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 06:45:45 +01:00			`return -1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`}`
			`else if (!old_preferred)`
			`max_size = cur_entry->delta_size-1;`
			`else`
			`/* otherwise... even if delta with a`
			`* preferred one produces a bigger result than`
			`* what we currently have, which is based on a`
			`* non-preferred one, it is OK.`
			`*/`
			`;`
			`}`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`if (sizediff >= max_size)`
			`return -1;`
[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair. This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-26 13:29:18 +02:00			`delta_buf = diff_delta(old->data, oldsize,`
			`cur->data, size, &delta_size, max_size);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (!delta_buf)`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`return 0;`
			`cur_entry->delta = old_entry;`
			`cur_entry->delta_size = delta_size;`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`cur_entry->depth = old_entry->depth + 1;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`cur_entry->based_on_preferred = old_preferred;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`free(delta_buf);`
[PATCH] (patchlet) pack-objects.c: try_delta() Return value of try_delta is checked for negativeness, but the success path does not return anything, letting compiler warn and presumably return garbage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-26 02:36:26 +02:00			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`static void progress_interval(int signum)`
			`{`
			`signal(SIGALRM, progress_interval);`
			`progress_update = 1;`
			`}`

Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`static void find_deltas(struct object_entry **list, int window, int depth)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`{`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`int i, idx;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`unsigned int array_size = window * sizeof(struct unpacked);`
			`struct unpacked *array = xmalloc(array_size);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`unsigned processed = 0;`
			`unsigned last_percent = 999;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
			`memset(array, 0, array_size);`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`i = nr_objects;`
			`idx = 0;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress)`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`fprintf(stderr, "Deltifying %d objects.\n", nr_result);`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`while (--i >= 0) {`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct object_entry *entry = list[i];`
			`struct unpacked *n = array + idx;`
			`unsigned long size;`
			`char type[10];`
			`int j;`

Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (!entry->preferred_base)`
			`processed++;`

pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (progress) {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`unsigned percent = processed * 100 / nr_result;`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`if (percent != last_percent \|\| progress_update) {`
			`fprintf(stderr, "%4u%% (%u/%u) done\r",`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`percent, processed, nr_result);`
pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 01:02:59 +01:00			`progress_update = 0;`
			`last_percent = percent;`
			`}`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`}`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00
			`if (entry->delta)`
			`/* This happens if we decided to reuse existing`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`* delta from a pack. "!no_reuse_delta &&" is implied.`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`*/`
			`continue;`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`free(n->data);`
			`n->entry = entry;`
			`n->data = read_sha1_file(entry->sha1, type, &size);`
			`if (size != entry->size)`
			`die("object %s inconsistent object length (%lu vs %lu)", sha1_to_hex(entry->sha1), size, entry->size);`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00
Fix delta "sliding window" code When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases. 2005-06-26 03:29:23 +02:00			`j = window;`
			`while (--j > 0) {`
			`unsigned int other_idx = idx + j;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`struct unpacked *m;`
Fix delta "sliding window" code When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases. 2005-06-26 03:29:23 +02:00			`if (other_idx >= window)`
			`other_idx -= window;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`m = array + other_idx;`
			`if (!m->entry)`
			`break;`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`if (try_delta(n, m, depth) < 0)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`break;`
			`}`
git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller. 2005-06-26 22:43:41 +02:00			`idx++;`
			`if (idx >= window)`
			`idx = 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:58 +02:00
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`if (progress)`
			`fputc('\n', stderr);`

[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:58 +02:00			`for (i = 0; i < window; ++i)`
			`free(array[i].data);`
			`free(array);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`

pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`static void prepare_pack(int window, int depth)`
			`{`
pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 02:34:29 +01:00			`get_object_details();`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`sorted_by_type = create_sorted_list(type_size_sort);`
			`if (window && depth)`
			`find_deltas(sorted_by_type, window+1, depth);`
			`}`

			`static int reuse_cached_pack(unsigned char *sha1, int pack_to_stdout)`
			`{`
			`static const char cache[] = "pack-cache/pack-%s.%s";`
			`char cached_pack, cached_idx;`
			`int ifd, ofd, ifd_ix = -1;`

			`cached_pack = git_path(cache, sha1_to_hex(sha1), "pack");`
			`ifd = open(cached_pack, O_RDONLY);`
			`if (ifd < 0)`
			`return 0;`

			`if (!pack_to_stdout) {`
			`cached_idx = git_path(cache, sha1_to_hex(sha1), "idx");`
			`ifd_ix = open(cached_idx, O_RDONLY);`
			`if (ifd_ix < 0) {`
			`close(ifd);`
			`return 0;`
			`}`
			`}`

pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (progress)`
			`fprintf(stderr, "Reusing %d objects pack %s\n", nr_objects,`
			`sha1_to_hex(sha1));`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00
			`if (pack_to_stdout) {`
			`if (copy_fd(ifd, 1))`
			`exit(1);`
			`close(ifd);`
			`}`
			`else {`
			`char name[PATH_MAX];`
			`snprintf(name, sizeof(name),`
			`"%s-%s.%s", base_name, sha1_to_hex(sha1), "pack");`
			`ofd = open(name, O_CREAT \| O_EXCL \| O_WRONLY, 0666);`
			`if (ofd < 0)`
			`die("unable to open %s (%s)", name, strerror(errno));`
			`if (copy_fd(ifd, ofd))`
			`exit(1);`
			`close(ifd);`

			`snprintf(name, sizeof(name),`
			`"%s-%s.%s", base_name, sha1_to_hex(sha1), "idx");`
			`ofd = open(name, O_CREAT \| O_EXCL \| O_WRONLY, 0666);`
			`if (ofd < 0)`
			`die("unable to open %s (%s)", name, strerror(errno));`
			`if (copy_fd(ifd_ix, ofd))`
			`exit(1);`
			`close(ifd_ix);`
			`puts(sha1_to_hex(sha1));`
			`}`

			`return 1;`
			`}`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`int main(int argc, char **argv)`
			`{`
Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes. 2005-07-04 00:34:04 +02:00			`SHA_CTX ctx;`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00			`char line[PATH_MAX + 20];`
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`int window = 10, depth = 10, pack_to_stdout = 0;`
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00			`struct object_entry **list;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`int i;`

Make the rest of commands work from a subdirectory. These commands are converted to run from a subdirectory. commit-tree convert-objects merge-base merge-index mktag pack-objects pack-redundant prune-packed read-tree tar-tree unpack-file unpack-objects update-server-info write-tree Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-26 09:50:02 +01:00			`setup_git_directory();`

git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`for (i = 1; i < argc; i++) {`
			`const char *arg = argv[i];`

			`if (*arg == '-') {`
Add "--non-empty" flag to git-pack-objects It skips writing the pack-file if it ends up being empty. 2005-07-03 22:36:58 +02:00			`if (!strcmp("--non-empty", arg)) {`
			`non_empty = 1;`
			`continue;`
			`}`
Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`if (!strcmp("--local", arg)) {`
			`local = 1;`
			`continue;`
			`}`
Add "--incremental" flag to git-pack-objects It won't add an object that is already in a pack to the new pack. 2005-07-03 22:08:40 +02:00			`if (!strcmp("--incremental", arg)) {`
			`incremental = 1;`
			`continue;`
			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (!strncmp("--window=", arg, 9)) {`
			`char *end;`
git-pack-objects: make "--window=x" semantics more logical. A zero disables delta generation (like before), but we make the window be one bigger than specified, since we use one entry for the one to be tested (it used to be that "--window=1" was meaningless, since we'd have used up the single-entry window with the entry to be tested, and had no chance of actually ever finding a delta). The default window remains at 10, but now it really means "test the 10 closest objects", not "test the 9 closest objects". 2005-06-26 04:35:47 +02:00			`window = strtoul(arg+9, &end, 0);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (!arg[9] \|\| *end)`
			`usage(pack_usage);`
			`continue;`
			`}`
Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number. 2005-06-26 05:17:59 +02:00			`if (!strncmp("--depth=", arg, 8)) {`
			`char *end;`
			`depth = strtoul(arg+8, &end, 0);`
			`if (!arg[8] \|\| *end)`
			`usage(pack_usage);`
			`continue;`
			`}`
Make pack-objects chattier. You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 22:01:54 +01:00			`if (!strcmp("-q", arg)) {`
			`progress = 0;`
			`continue;`
			`}`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (!strcmp("--no-reuse-delta", arg)) {`
			`no_reuse_delta = 1;`
			`continue;`
			`}`
git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`if (!strcmp("--stdout", arg)) {`
			`pack_to_stdout = 1;`
			`continue;`
			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`usage(pack_usage);`
			`}`
			`if (base_name)`
			`usage(pack_usage);`
			`base_name = arg;`
			`}`

git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file. 2005-06-28 20:10:48 +02:00			`if (pack_to_stdout != !base_name)`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`usage(pack_usage);`

Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-14 00:38:28 +02:00			`prepare_packed_git();`
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`if (progress) {`
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00			`struct itimerval v;`
			`v.it_interval.tv_sec = 1;`
			`v.it_interval.tv_usec = 0;`
			`v.it_value = v.it_interval;`
			`signal(SIGALRM, progress_interval);`
			`setitimer(ITIMER_REAL, &v, NULL);`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`fprintf(stderr, "Generating pack...\n");`
			`}`
nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 22:00:08 +01:00
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`while (fgets(line, sizeof(line), stdin) != NULL) {`
			`unsigned char sha1[20];`
git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well. 2005-06-27 00:27:28 +02:00
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (line[0] == '-') {`
			`if (get_sha1_hex(line+1, sha1))`
			`die("expected edge sha1, got garbage:\n %s",`
			`line+1);`
			`add_preferred_base(sha1);`
			`continue;`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`}`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`if (get_sha1_hex(line, sha1))`
git-repack: Properly abort in corrupt repository In a corrupt repository, git-repack produces a pack that does not contain needed objects without complaining, and the result of this combined with -d flag can be very painful -- e.g. a lossage of one tree object can lead to lossage of blobs reachable only through that tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-21 21:38:31 +01:00			`die("expected sha1, got garbage:\n %s", line);`
pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`add_object_entry(sha1, name_hash(NULL, line+41), 0);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`}`
fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:54:18 +01:00			`if (progress)`
			`fprintf(stderr, "Done counting %d objects.\n", nr_objects);`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`sorted_by_sha = create_final_object_list();`
			`if (non_empty && !nr_result)`
Add "--non-empty" flag to git-pack-objects It skips writing the pack-file if it ends up being empty. 2005-07-03 22:36:58 +02:00			`return 0;`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00			`SHA1_Init(&ctx);`
			`list = sorted_by_sha;`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`for (i = 0; i < nr_result; i++) {`
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00			`struct object_entry entry = list++;`
			`SHA1_Update(&ctx, entry->sha1, 20);`
			`}`
			`SHA1_Final(object_list_sha1, &ctx);`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`if (progress && (nr_objects != nr_result))`
			`fprintf(stderr, "Result has %d objects.\n", nr_result);`
Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-13 01:54:19 +02:00
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`if (reuse_cached_pack(object_list_sha1, pack_to_stdout))`
			`;`
			`else {`
Merge branches 'jc/rev-list' and 'jc/pack-thin' * jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next") 2006-02-25 06:55:23 +01:00			`if (nr_result)`
			`prepare_pack(window, depth);`
also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 23:41:32 +01:00			`if (progress && pack_to_stdout) {`
			`/* the other end usually displays progress itself */`
			`struct itimerval v = {{0,},};`
			`setitimer(ITIMER_REAL, &v, NULL);`
			`signal(SIGALRM, SIG_IGN );`
			`progress_update = 0;`
			`}`
			`write_pack_file();`
pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-22 10:28:13 +02:00			`if (!pack_to_stdout) {`
			`write_index_file();`
			`puts(sha1_to_hex(object_list_sha1));`
			`}`
Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes. 2005-07-04 00:34:04 +02:00			`}`
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 20:55:51 +01:00			`if (progress)`
			`fprintf(stderr, "Total %d, written %d (delta %d), reused %d (delta %d)\n",`
Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 23:47:21 +01:00			`nr_result, written, written_delta, reused, reused_delta);`
git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no? 2005-06-25 23:42:43 +02:00			`return 0;`
			`}`