mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 06:54:55 +01:00

1634 lines

42 KiB

C

Raw Normal View History

sparse: Fix an "symbol 'cmd_index_pack' not declared" warning Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-07 20:23:40 +02:00			`#include "builtin.h"`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`#include "delta.h"`
			`#include "pack.h"`
			`#include "csum-file.h"`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`#include "blob.h"`
			`#include "commit.h"`
			`#include "tag.h"`
			`#include "tree.h"`
common progress display support Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-18 20:27:45 +02:00			`#include "progress.h"`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`#include "fsck.h"`
Add calls to git_extract_argv0_path() in programs that call git_config_* Programs that use git_config need to find the global configuration. When runtime prefix computation is enabled, this requires that git_extract_argv0_path() is called early in the program's main(). This commit adds the necessary calls. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-18 13:00:12 +01:00			`#include "exec_cmd.h"`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`#include "streaming.h"`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#include "thread-utils.h"`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`static const char index_pack_usage[] =`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`"git index-pack [-v] [-o <index-file>] [--keep \| --keep=<msg>] [--verify] [--strict] (<pack-file> \| --stdin [--fix-thin] [<pack-file>])";`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
standardize brace placement in struct definitions In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char baz; }; Indeed, grepping for 'struct [a-z_] {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-16 08:08:34 +01:00			`struct object_entry {`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`struct pack_idx_entry idx;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`unsigned long size;`
			`unsigned int hdr_size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`enum object_type type;`
			`enum object_type real_type;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`unsigned delta_depth;`
			`int base_object_no;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`};`

teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`union delta_base {`
			`unsigned char sha1[20];`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`off_t offset;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`};`

index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`struct base_data {`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`struct base_data *base;`
			`struct base_data *child;`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`struct object_entry *obj;`
index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`void *data;`
			`unsigned long size;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`int ref_first, ref_last;`
			`int ofs_first, ofs_last;`
index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`};`

index-pack: Disable threading on cygwin The Cygwin implementation of pread() is not thread-safe since, just like the emulation provided by compat/pread.c, it uses a sequence of seek-read-seek calls. In order to avoid failues due to thread-safety issues, commit b038a61 disables threading when NO_PREAD is defined. (ie when using the emulation code in compat/pread.c). We introduce a new build variable, NO_THREAD_SAFE_PREAD, which allows use to disable the threaded index-pack code on cygwin, in addition to the above NO_PREAD case. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-06-26 20:19:32 +02:00			`#if !defined(NO_PTHREADS) && defined(NO_THREAD_SAFE_PREAD)`
			`/* pread() emulation is not thread-safe. Disable threading. */`
index-pack: disable threading if NO_PREAD is defined NO_PREAD simulates pread() as a sequence of seek, read, seek in compat/pread.c. The simulation is not thread-safe because another thread could move the file offset away in the middle of pread operation. Do not allow threading in that case. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:56 +02:00			`#define NO_PTHREADS`
			`#endif`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`struct thread_local {`
			`#ifndef NO_PTHREADS`
			`pthread_t thread;`
			`#endif`
			`struct base_data *base_cache;`
			`size_t base_cache_used;`
			`};`

index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`/*`
			`* Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want`
			`* to memcmp() only the first 20 bytes.`
			`*/`
			`#define UNION_BASE_SZ 20`

index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`#define FLAG_LINK (1u<<20)`
			`#define FLAG_CHECKED (1u<<21)`

standardize brace placement in struct definitions In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char baz; }; Indeed, grepping for 'struct [a-z_] {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-16 08:08:34 +01:00			`struct delta_entry {`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`union delta_base base;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`int obj_no;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`};`

			`static struct object_entry *objects;`
			`static struct delta_entry *deltas;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static struct thread_local nothread_data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`static int nr_objects;`
			`static int nr_deltas;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`static int nr_resolved_deltas;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static int nr_threads;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`static int from_stdin;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`static int strict;`
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`static int verbose;`

make struct progress an opaque type This allows for better management of progress "object" existence, as well as making the progress display implementation more independent from its callers. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:32 +01:00			`static struct progress *progress;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`/* We always read in 4kB chunks. */`
			`static unsigned char input_buffer[4096];`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`static unsigned int input_offset, input_len;`
			`static off_t consumed_bytes;`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`static unsigned deepest_delta;`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static git_SHA_CTX input_ctx;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`static uint32_t input_crc32;`
index-pack usage of mmap() is unacceptably slower on many OSes other than Linux It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that indexing the Linux repository ~150MB pack takes about an hour on OS x while it's a minute on Linux. It seems that the OS X mmap() implementation is more than 2 orders of magnitude slower than the Linux one. Linus proposed a patch replacing mmap() with pread() bringing index-pack performance on OS X in line with the Linux one. The performances on Linux also improved by a small margin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-19 16:53:08 +01:00			`static int input_fd, output_fd, pack_fd;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`

			`static struct thread_local *thread_data;`
			`static int nr_dispatched;`
			`static int threads_active;`

			`static pthread_mutex_t read_mutex;`
			`#define read_lock() lock_mutex(&read_mutex)`
			`#define read_unlock() unlock_mutex(&read_mutex)`

			`static pthread_mutex_t counter_mutex;`
			`#define counter_lock() lock_mutex(&counter_mutex)`
			`#define counter_unlock() unlock_mutex(&counter_mutex)`

			`static pthread_mutex_t work_mutex;`
			`#define work_lock() lock_mutex(&work_mutex)`
			`#define work_unlock() unlock_mutex(&work_mutex)`

			`static pthread_key_t key;`

			`static inline void lock_mutex(pthread_mutex_t *mutex)`
			`{`
			`if (threads_active)`
			`pthread_mutex_lock(mutex);`
			`}`

			`static inline void unlock_mutex(pthread_mutex_t *mutex)`
			`{`
			`if (threads_active)`
			`pthread_mutex_unlock(mutex);`
			`}`

			`/*`
			`* Mutex and conditional variable can't be statically-initialized on Windows.`
			`*/`
			`static void init_thread(void)`
			`{`
			`init_recursive_mutex(&read_mutex);`
			`pthread_mutex_init(&counter_mutex, NULL);`
			`pthread_mutex_init(&work_mutex, NULL);`
			`pthread_key_create(&key, NULL);`
			`thread_data = xcalloc(nr_threads, sizeof(*thread_data));`
			`threads_active = 1;`
			`}`

			`static void cleanup_thread(void)`
			`{`
			`if (!threads_active)`
			`return;`
			`threads_active = 0;`
			`pthread_mutex_destroy(&read_mutex);`
			`pthread_mutex_destroy(&counter_mutex);`
			`pthread_mutex_destroy(&work_mutex);`
			`pthread_key_delete(key);`
			`free(thread_data);`
			`}`

			`#else`

			`#define read_lock()`
			`#define read_unlock()`

			`#define counter_lock()`
			`#define counter_unlock()`

			`#define work_lock()`
			`#define work_unlock()`

			`#endif`


index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`static int mark_link(struct object obj, int type, void data)`
			`{`
			`if (!obj)`
			`return -1;`

			`if (type != OBJ_ANY && obj->type != type)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("object type mismatch at %s"), sha1_to_hex(obj->sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`obj->flags \|= FLAG_LINK;`
			`return 0;`
			`}`

			`/* The content of each linked object must have been checked`
			`or it must be already present in the object database */`
			`static void check_object(struct object *obj)`
			`{`
			`if (!obj)`
			`return;`

			`if (!(obj->flags & FLAG_LINK))`
			`return;`

			`if (!(obj->flags & FLAG_CHECKED)) {`
			`unsigned long size;`
			`int type = sha1_object_info(obj->sha1, &size);`
			`if (type != obj->type \|\| type <= 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("object of unexpected type"));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`obj->flags \|= FLAG_CHECKED;`
			`return;`
			`}`
			`}`

			`static void check_objects(void)`
			`{`
			`unsigned i, max;`

			`max = get_max_object_index();`
			`for (i = 0; i < max; i++)`
			`check_object(get_indexed_object(i));`
			`}`


make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`/* Discard current buffer used content. */`
sparse fix: non-ANSI function declaration The declaration of discard_cache() in cache.h already has its "void". Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-18 13:07:06 +01:00			`static void flush(void)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
			`if (input_offset) {`
			`if (output_fd >= 0)`
			`write_or_die(output_fd, input_buffer, input_offset);`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(&input_ctx, input_buffer, input_offset);`
Don't use memcpy when source and dest. buffers may overlap git-index-pack can call memcpy with overlapping source and destination buffers. The patch below makes it use memmove instead. If you want to demonstrate a failure, add the following two lines + if (input_offset < input_len) + abort (); before the existing memcpy call (shown in the patch below), and then run this: (cd t; sh ./t5500-fetch-pack.sh) Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-11 19:06:34 +01:00			`memmove(input_buffer, input_buffer + input_offset, input_len);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`input_offset = 0;`
			`}`
			`}`

add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`/*`
			`* Make sure at least "min" bytes are available in the buffer, and`
			`* return the pointer to the buffer.`
			`*/`
index-pack: minor fixes to comment and function name Use proper english. Be more exact in one comment. [jc: I threw in a bit of style clean-up as well] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-27 22:14:23 +02:00			`static void *fill(int min)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`if (min <= input_len)`
			`return input_buffer + input_offset;`
			`if (min > sizeof(input_buffer))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(Q_("cannot fill %d byte",`
			`"cannot fill %d bytes",`
			`min),`
			`min);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`flush();`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`do {`
Ensure return value from xread() is always stored into an ssize_t This patch fixes all calls to xread() where the return value is not stored into an ssize_t. The patch should not have any effect whatsoever, other than putting better/more appropriate type names on variables. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-15 14:49:22 +02:00			`ssize_t ret = xread(input_fd, input_buffer + input_len,`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`sizeof(input_buffer) - input_len);`
			`if (ret <= 0) {`
			`if (!ret)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("early EOF"));`
			`die_errno(_("read error on input"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`}`
			`input_len += ret;`
make display of total transferred more accurate The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:15:41 +01:00			`if (from_stdin)`
			`display_throughput(progress, consumed_bytes + input_len);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`} while (input_len < min);`
			`return input_buffer;`
			`}`

			`static void use(int bytes)`
			`{`
			`if (bytes > input_len)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("used more bytes than were available"));`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`input_crc32 = crc32(input_crc32, input_buffer + input_offset, bytes);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`input_len -= bytes;`
			`input_offset += bytes;`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00
			`/* make sure off_t is sufficiently large not to wrap */`
do not depend on signed integer overflow Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-05 09:24:10 +02:00			`if (signed_add_overflows(consumed_bytes, bytes))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack too large for current definition of off_t"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`consumed_bytes += bytes;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`static const char open_pack_file(const char pack_name)`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`{`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (from_stdin) {`
			`input_fd = 0;`
			`if (!pack_name) {`
Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-21 02:18:21 +01:00			`static char tmp_file[PATH_MAX];`
			`output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),`
Make sure objects/pack exists before creating a new pack In a repository created with git older than f49fb35 (git-init-db: create "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is not created upon initialization. It was Ok because subdirectories are created as needed inside directories init-db creates, and back then, packfiles were recent invention. After the said commit, new codepaths started relying on the presense of objects/pack/ directory in the repository. This was exacerbated with 8b4eb6b (Do not perform cross-directory renames when creating packs, 2008-09-22) that moved the location temporary pack files are created from objects/ directory to objects/pack/ directory, because moving temporary to the final location was done carefully with lazy leading directory creation. Many packfile related operations in such an old repository can fail mysteriously because of this. This commit introduces two helper functions to make things work better. - odb_mkstemp() is a specialized version of mkstemp() to refactor the code and teach it to create leading directories as needed; - odb_pack_keep() refactors the code to create a ".keep" file while create leading directories as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-25 08:11:29 +01:00			`"pack/tmp_pack_XXXXXX");`
Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-21 02:18:21 +01:00			`pack_name = xstrdup(tmp_file);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else`
			`output_fd = open(pack_name, O_CREAT\|O_EXCL\|O_RDWR, 0600);`
			`if (output_fd < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("unable to create '%s'"), pack_name);`
index-pack usage of mmap() is unacceptably slower on many OSes other than Linux It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that indexing the Linux repository ~150MB pack takes about an hour on OS x while it's a minute on Linux. It seems that the OS X mmap() implementation is more than 2 orders of magnitude slower than the Linux one. Linus proposed a patch replacing mmap() with pread() bringing index-pack performance on OS X in line with the Linux one. The performances on Linux also improved by a small margin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-19 16:53:08 +01:00			`pack_fd = output_fd;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else {`
			`input_fd = open(pack_name, O_RDONLY);`
			`if (input_fd < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot open packfile '%s'"), pack_name);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`output_fd = -1;`
index-pack usage of mmap() is unacceptably slower on many OSes other than Linux It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that indexing the Linux repository ~150MB pack takes about an hour on OS x while it's a minute on Linux. It seems that the OS X mmap() implementation is more than 2 orders of magnitude slower than the Linux one. Linus proposed a patch replacing mmap() with pread() bringing index-pack performance on OS X in line with the Linux one. The performances on Linux also improved by a small margin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-19 16:53:08 +01:00			`pack_fd = input_fd;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`}`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Init(&input_ctx);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`return pack_name;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

			`static void parse_pack_header(void)`
			`{`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`struct pack_header *hdr = fill(sizeof(struct pack_header));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`/* Header consistency check */`
			`if (hdr->hdr_signature != htonl(PACK_SIGNATURE))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack signature mismatch"));`
remove delta-against-self bit After experimenting with code to add the ability to encode a delta against part of the deltified file, it turns out that resulting packs are _bigger_ than when this ability is not used. The raw delta output might be smaller, but it doesn't compress as well using gzip with a negative net saving on average. Said bit would in fact be more useful to allow for encoding the copying of chunks larger than 64KB providing more savings with large files. This will correspond to packs version 3. While the current code still produces packs version 2, it is made future proof so pack versions 2 and 3 are accepted. Any pack version 2 are compatible with version 3 since the redefined bit was never used before. When enough time has passed, code to use that bit to produce version 3 packs could be added. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 23:50:04 +01:00			`if (!pack_version_ok(hdr->hdr_version))`
Fix some warnings (on cygwin) to allow -Werror When printing valuds of type uint32_t, we should use PRIu32, and should not assume that it is unsigned int. On 32-bit platforms, it could be defined as unsigned long. The same caution applies to ntohl(). Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-03 17:52:09 +02:00			`die("pack version %"PRIu32" unsupported",`
			`ntohl(hdr->hdr_version));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`nr_objects = ntohl(hdr->hdr_entries);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`use(sizeof(struct pack_header));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

increase portability of NORETURN declarations Some compilers (including at least MSVC) support NORETURN on function declarations, but only before the function-name. This patch makes it possible to define NORETURN to something meaningful for those compilers. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2009-09-30 20:05:49 +02:00			`static NORETURN void bad_object(unsigned long offset, const char *format,`
			`...) __attribute__((format (printf, 2, 3)));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Fix sparse warnings Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-22 08:51:05 +01:00			`static NORETURN void bad_object(unsigned long offset, const char *format, ...)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
			`va_list params;`
			`char buf[1024];`

			`va_start(params, format);`
			`vsnprintf(buf, sizeof(buf), format, params);`
			`va_end(params);`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack has bad object at offset %lu: %s"), offset, buf);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static inline struct thread_local *get_thread_data(void)`
			`{`
			`#ifndef NO_PTHREADS`
			`if (threads_active)`
			`return pthread_getspecific(key);`
			`assert(!threads_active &&`
			`"This should only be reached when all threads are gone");`
			`#endif`
			`return &nothread_data;`
			`}`

			`#ifndef NO_PTHREADS`
			`static void set_thread_data(struct thread_local *data)`
			`{`
			`if (threads_active)`
			`pthread_setspecific(key, data);`
			`}`
			`#endif`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`static struct base_data *alloc_base_data(void)`
			`{`
			`struct base_data *base = xmalloc(sizeof(struct base_data));`
			`memset(base, 0, sizeof(*base));`
			`base->ref_last = -1;`
			`base->ofs_last = -1;`
			`return base;`
			`}`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`static void free_base_data(struct base_data *c)`
			`{`
			`if (c->data) {`
			`free(c->data);`
			`c->data = NULL;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used -= c->size;`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`}`
			`}`

index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`static void prune_base_data(struct base_data *retain)`
			`{`
Fix various dead stores found by the clang static analyzer http-push.c::finish_request(): request is initialized by the for loop index-pack.c::free_base_data(): b is initialized by the for loop merge-recursive.c::process_renames(): move compare to narrower scope, and remove unused assignments to it remove unused variable renames2 xdiff/xdiffi.c::xdl_recs_cmp(): remove unused variable ec xdiff/xemit.c::xdl_emit_diff(): xche is always overwritten Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-15 22:01:20 +01:00			`struct base_data *b;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`struct thread_local *data = get_thread_data();`
			`for (b = data->base_cache;`
			`data->base_cache_used > delta_base_cache_limit && b;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`b = b->child) {`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`if (b->data && b != retain)`
			`free_base_data(b);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`}`
			`}`

index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`static void link_base_data(struct base_data base, struct base_data c)`
			`{`
			`if (base)`
			`base->child = c;`
			`else`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache = c;`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
			`c->base = base;`
			`c->child = NULL;`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`if (c->data)`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`prune_base_data(c);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`}`

			`static void unlink_base_data(struct base_data *c)`
			`{`
			`struct base_data *base = c->base;`
			`if (base)`
			`base->child = NULL;`
			`else`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache = NULL;`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`free_base_data(c);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`}`

index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`static int is_delta_type(enum object_type type)`
			`{`
			`return (type == OBJ_REF_DELTA \|\| type == OBJ_OFS_DELTA);`
			`}`

			`static void *unpack_entry_data(unsigned long offset, unsigned long size,`
			`enum object_type type, unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`static char fixed_buf[8192];`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`int status;`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`void *buf;`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`git_SHA_CTX c;`
			`char hdr[32];`
			`int hdrlen;`

			`if (!is_delta_type(type)) {`
			`hdrlen = sprintf(hdr, "%s %lu", typename(type), size) + 1;`
			`git_SHA1_Init(&c);`
			`git_SHA1_Update(&c, hdr, hdrlen);`
			`} else`
			`sha1 = NULL;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`if (type == OBJ_BLOB && size > big_file_threshold)`
			`buf = fixed_buf;`
			`else`
			`buf = xmalloc(size);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`memset(&stream, 0, sizeof(stream));`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`git_inflate_init(&stream);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`stream.next_out = buf;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`stream.avail_out = buf == fixed_buf ? sizeof(fixed_buf) : size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`do {`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`unsigned char *last_out = stream.next_out;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`stream.next_in = fill(1);`
			`stream.avail_in = input_len;`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`status = git_inflate(&stream, 0);`
			`use(input_len - stream.avail_in);`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`if (sha1)`
			`git_SHA1_Update(&c, last_out, stream.next_out - last_out);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`if (buf == fixed_buf) {`
			`stream.next_out = buf;`
			`stream.avail_out = sizeof(fixed_buf);`
			`}`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`} while (status == Z_OK);`
			`if (stream.total_out != size \|\| status != Z_STREAM_END)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(offset, _("inflate returned %d"), status);`
Wrap inflate and other zlib routines for better error reporting R. Tyler Ballance reported a mysterious transient repository corruption; after much digging, it turns out that we were not catching and reporting memory allocation errors from some calls we make to zlib. This one _just_ wraps things; it doesn't do the "retry on low memory error" part, at least not yet. It is an independent issue from the reporting. Some of the errors are expected and passed back to the caller, but we die when zlib reports it failed to allocate memory for now. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-08 04:54:47 +01:00			`git_inflate_end(&stream);`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`if (sha1)`
			`git_SHA1_Final(sha1, &c);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`return buf == fixed_buf ? NULL : buf;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`static void unpack_raw_entry(struct object_entry obj,`
			`union delta_base *delta_base,`
			`unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 02:22:27 +02:00			`unsigned char *p;`
			`unsigned long size, c;`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`off_t base_offset;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`unsigned shift;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`void *data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->idx.offset = consumed_bytes;`
sparse: Fix errors and silence warnings * load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-03 09:06:54 +02:00			`input_crc32 = crc32(0, NULL, 0);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`p = fill(1);`
			`c = *p;`
			`use(1);`
			`obj->type = (c >> 4) & 7;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`size = (c & 15);`
			`shift = 4;`
			`while (c & 0x80) {`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 02:22:27 +02:00			`size += (c & 0x7f) << shift;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`shift += 7;`
			`}`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`obj->size = size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`switch (obj->type) {`
introduce delta objects with offset to base This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to OBJ_REF_DELTA to better make the distinction between those two delta objects, and adds support for the handling of those new delta objects in sha1_file.c only. The OBJ_OFS_DELTA contains a relative offset from the delta object's position in a pack instead of the 20-byte SHA1 reference to identify the base object. Since the base is likely to be not so far away, the relative offset is more likely to have a smaller encoding on average than an absolute offset. And for those delta objects the base must always be stored first because there is no way to know the distance of later objects when streaming a pack. Hence this relative offset is always meant to be negative. The offset encoding is slightly denser than the one used for object size -- credits to <linux@horizon.com> (whoever this is) for bringing it to my attention. This allows for pack size reduction between 3.2% (Linux-2.6) to over 5% (linux-historic). Runtime pack access should be faster too since delta replay does skip a search in the pack index for each delta in a chain. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:06:49 +02:00			`case OBJ_REF_DELTA:`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`hashcpy(delta_base->sha1, fill(20));`
			`use(20);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`break;`
			`case OBJ_OFS_DELTA:`
			`memset(delta_base, 0, sizeof(*delta_base));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`base_offset = c & 127;`
			`while (c & 128) {`
			`base_offset += 1;`
make overflow test on delta base offset work regardless of variable size This patch introduces the MSB() macro to obtain the desired number of most significant bits from a given variable independently of the variable type. It is then used to better implement the overflow test on the OBJ_OFS_DELTA base offset variable with the property of always working correctly regardless of the type/size of that variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:29 +02:00			`if (!base_offset \|\| MSB(base_offset, 7))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("offset value overflow for delta base object"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`base_offset = (base_offset << 7) + (c & 127);`
			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`delta_base->offset = obj->idx.offset - base_offset;`
better validation on delta base object offsets In one case, it was possible to have a bad offset equal to 0 effectively pointing a delta onto itself and crashing git after too many recursions. In the other cases, a negative offset could result due to off_t being signed. Catch those. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-30 00:02:45 +01:00			`if (delta_base->offset <= 0 \|\| delta_base->offset >= obj->idx.offset)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("delta base offset is out of bound"));`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`break;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`case OBJ_COMMIT:`
			`case OBJ_TREE:`
			`case OBJ_BLOB:`
			`case OBJ_TAG:`
			`break;`
			`default:`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("unknown object type %d"), obj->type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->hdr_size = consumed_bytes - obj->idx.offset;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, sha1);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->idx.crc32 = input_crc32;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`return data;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`}`

index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`static void unpack_data(struct object_entry obj,`
			`int (consume)(const unsigned char , unsigned long, void *),`
			`void *cb_data)`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`{`
fix index-pack with packs >4GB containing deltas on 32-bit machines This probably hasn't been properly tested before. Here's a script to create a 8GB repo with the necessary characteristics (copy the test-genrandom executable from the Git build tree to /tmp first): ----- #!/bin/bash git init git config core.compression 0 # create big objects with no deltas for i in $(seq -w 1 2 63) do echo $i /tmp/test-genrandom $i 268435456 > file_$i git add file_$i rm file_$i echo "file_$i -delta" >> .gitattributes done # create "deltifiable" objects in between big objects for i in $(seq -w 2 2 64) do echo "$i $i $i" >> grow cp grow file_$i git add file_$i rm file_$i done rm grow # create a pack with them git commit -q -m "commit of big objects interlaced with small deltas" git repack -a -d ----- Then clone this repo over the Git protocol. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-11 05:29:10 +01:00			`off_t from = obj[0].idx.offset + obj[0].hdr_size;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`unsigned long len = obj[1].idx.offset - from;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`unsigned char data, inbuf;`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`int status;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`data = xmalloc(consume ? 64*1024 : obj->size);`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`inbuf = xmalloc((len < 641024) ? len : 641024);`

add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`memset(&stream, 0, sizeof(stream));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`git_inflate_init(&stream);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`stream.next_out = data;`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`stream.avail_out = consume ? 64*1024 : obj->size;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00
			`do {`
			`ssize_t n = (len < 641024) ? len : 641024;`
			`n = pread(pack_fd, inbuf, n, from);`
			`if (n < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot pread pack file"));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`if (!n)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(Q_("premature end of pack file, %lu byte missing",`
			`"premature end of pack file, %lu bytes missing",`
			`len),`
			`len);`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`from += n;`
			`len -= n;`
			`stream.next_in = inbuf;`
			`stream.avail_in = n;`
index-pack: loop while inflating objects in unpack_data When the unpack_data function is given a consume() callback, it unpacks only 64K of the input at a time, feeding it to git_inflate along with a 64K output buffer. However, because we are inflating, there is a good chance that the output buffer will fill before consuming all of the input. In this case, we need to loop on git_inflate until we have fed the whole input buffer, feeding each chunk of output to the consume buffer. The current code does not do this, and as a result, will fail the loop condition and trigger a fatal "serious inflate inconsistency" error in this case. While we're rearranging the loop, let's get rid of the extra last_out pointer. It is meant to point to the beginning of the buffer that we feed to git_inflate, but in practice this is always the beginning of our same 64K buffer, because: 1. At the beginning of the loop, we are feeding the buffer. 2. At the end of the loop, if we are using a consume() function, we reset git_inflate's pointer to the beginning of the buffer. If we are not using a consume() function, then we do not care about the value of last_out at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-04 09:12:14 +02:00			`if (!consume)`
			`status = git_inflate(&stream, 0);`
			`else {`
			`do {`
			`status = git_inflate(&stream, 0);`
			`if (consume(data, stream.next_out - data, cb_data)) {`
			`free(inbuf);`
			`free(data);`
			`return NULL;`
			`}`
			`stream.next_out = data;`
			`stream.avail_out = 64*1024;`
			`} while (status == Z_OK && stream.avail_in);`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`}`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`} while (len && status == Z_OK && !stream.avail_in);`

			`/* This has been inflated OK when first encountered, so... */`
			`if (status != Z_STREAM_END \|\| stream.total_out != obj->size)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("serious inflate inconsistency"));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00
			`git_inflate_end(&stream);`
			`free(inbuf);`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`if (consume) {`
			`free(data);`
			`data = NULL;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`return data;`
			`}`

index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`static void get_data_from_pack(struct object_entry obj)`
			`{`
			`return unpack_data(obj, NULL, NULL);`
			`}`

index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`static int compare_delta_bases(const union delta_base *base1,`
			`const union delta_base *base2,`
			`enum object_type type1,`
			`enum object_type type2)`
			`{`
			`int cmp = type1 - type2;`
			`if (cmp)`
			`return cmp;`
			`return memcmp(base1, base2, UNION_BASE_SZ);`
			`}`

			`static int find_delta(const union delta_base *base, enum object_type type)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
			`int first = 0, last = nr_deltas;`

			`while (first < last) {`
			`int next = (first + last) / 2;`
			`struct delta_entry *delta = &deltas[next];`
			`int cmp;`

index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`cmp = compare_delta_bases(base, &delta->base,`
			`type, objects[delta->obj_no].type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`if (!cmp)`
			`return next;`
			`if (cmp < 0) {`
			`last = next;`
			`continue;`
			`}`
			`first = next+1;`
			`}`
			`return -first-1;`
			`}`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`static void find_delta_children(const union delta_base *base,`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`int first_index, int last_index,`
			`enum object_type type)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`int first = find_delta(base, type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`int last = first;`
			`int end = nr_deltas - 1;`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`if (first < 0) {`
			`*first_index = 0;`
			`*last_index = -1;`
			`return;`
			`}`
index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`while (first > 0 && !memcmp(&deltas[first - 1].base, base, UNION_BASE_SZ))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`--first;`
index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`while (last < end && !memcmp(&deltas[last + 1].base, base, UNION_BASE_SZ))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`++last;`
			`*first_index = first;`
			`*last_index = last;`
			`}`

index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`struct compare_data {`
			`struct object_entry *entry;`
			`struct git_istream *st;`
			`unsigned char *buf;`
			`unsigned long buf_size;`
			`};`

			`static int compare_objects(const unsigned char *buf, unsigned long size,`
			`void *cb_data)`
			`{`
			`struct compare_data *data = cb_data;`

			`if (data->buf_size < size) {`
			`free(data->buf);`
			`data->buf = xmalloc(size);`
			`data->buf_size = size;`
			`}`

			`while (size) {`
			`ssize_t len = read_istream(data->st, data->buf, size);`
			`if (len == 0)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`if (len < 0)`
			`die(_("unable to read %s"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`if (memcmp(buf, data->buf, len))`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`size -= len;`
			`buf += len;`
			`}`
			`return 0;`
			`}`

			`static int check_collison(struct object_entry *entry)`
			`{`
			`struct compare_data data;`
			`enum object_type type;`
			`unsigned long size;`

			`if (entry->size <= big_file_threshold \|\| entry->type != OBJ_BLOB)`
			`return -1;`

			`memset(&data, 0, sizeof(data));`
			`data.entry = entry;`
			`data.st = open_istream(entry->idx.sha1, &type, &size, NULL);`
			`if (!data.st)`
			`return -1;`
			`if (size != entry->size \|\| type != entry->type)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(entry->idx.sha1));`
			`unpack_data(entry, compare_objects, &data);`
			`close_istream(data.st);`
			`free(data.buf);`
			`return 0;`
			`}`

index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`static void sha1_object(const void data, struct object_entry obj_entry,`
			`unsigned long size, enum object_type type,`
			`const unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`void *new_data = NULL;`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`int collision_test_needed;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`assert(data \|\| obj_entry);`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_lock();`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`collision_test_needed = has_sha1_file(sha1);`
			`read_unlock();`

			`if (collision_test_needed && !data) {`
			`read_lock();`
			`if (!check_collison(obj_entry))`
			`collision_test_needed = 0;`
			`read_unlock();`
			`}`
			`if (collision_test_needed) {`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`void *has_data;`
			`enum object_type has_type;`
			`unsigned long has_size;`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`read_lock();`
			`has_type = sha1_object_info(sha1, &has_size);`
			`if (has_type != type \|\| has_size != size)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"), sha1_to_hex(sha1));`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`has_data = read_sha1_file(sha1, &has_type, &has_size);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_unlock();`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`if (!data)`
			`data = new_data = get_data_from_pack(obj_entry);`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`if (!has_data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot read existing object %s"), sha1_to_hex(sha1));`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`if (size != has_size \|\| type != has_type \|\|`
			`memcmp(data, has_data, size) != 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("SHA1 COLLISION FOUND WITH %s !"), sha1_to_hex(sha1));`
Plug memory leak in index-pack collision checking codepath. 2007-04-03 18:33:46 +02:00			`free(has_data);`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (strict) {`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_lock();`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (type == OBJ_BLOB) {`
			`struct blob *blob = lookup_blob(sha1);`
			`if (blob)`
			`blob->object.flags \|= FLAG_CHECKED;`
			`else`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("invalid blob object %s"), sha1_to_hex(sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`} else {`
			`struct object *obj;`
			`int eaten;`
			`void buf = (void ) data;`

index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`if (!buf)`
			`buf = new_data = get_data_from_pack(obj_entry);`

index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`/*`
			`* we do not need to free the memory here, as the`
			`* buf is deleted by the caller.`
			`*/`
			`obj = parse_object_buffer(sha1, type, size, buf, &eaten);`
			`if (!obj)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("invalid %s"), typename(type));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (fsck_object(obj, 1, fsck_error_function))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Error in object"));`
Fix various sparse warnings in the git source code There are a few remaining ones, but this fixes the trivial ones. It boils down to two main issues that sparse complains about: - warning: Using plain integer as NULL pointer Sparse doesn't like you using '0' instead of 'NULL'. For various good reasons, not the least of which is just the visual confusion. A NULL pointer is not an integer, and that whole "0 works as NULL" is a historical accident and not very pretty. A few of these remain: zlib is a total mess, and Z_NULL is just a 0. I didn't touch those. - warning: symbol 'xyz' was not declared. Should it be static? Sparse wants to see declarations for any functions you export. A lack of a declaration tends to mean that you should either add one, or you should mark the function 'static' to show that it's in file scope. A few of these remain: I only did the ones that should obviously just be made static. That 'wt_status_submodule_summary' one is debatable. It has a few related flags (like 'wt_status_use_color') which _are_ declared, and are used by builtin-commit.c. So maybe we'd like to export it at some point, but it's not declared now, and not used outside of that file, so 'static' it is in this patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 19:28:43 +02:00			`if (fsck_walk(obj, mark_link, NULL))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`if (obj->type == OBJ_TREE) {`
			`struct tree item = (struct tree ) obj;`
			`item->buffer = NULL;`
			`}`
			`if (obj->type == OBJ_COMMIT) {`
			`struct commit commit = (struct commit ) obj;`
			`commit->buffer = NULL;`
			`}`
			`obj->flags \|= FLAG_CHECKED;`
			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_unlock();`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`free(new_data);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`/*`
			`* This function is part of find_unresolved_deltas(). There are two`
			`* walkers going in the opposite ways.`
			`*`
			`* The first one in find_unresolved_deltas() traverses down from`
			`* parent node to children, deflating nodes along the way. However,`
			`* memory for deflated nodes is limited by delta_base_cache_limit, so`
			`* at some point parent node's deflated content may be freed.`
			`*`
			`* The second walker is this function, which goes from current node up`
			`* to top parent if necessary to deflate the node. In normal`
			`* situation, its parent node would be already deflated, so it just`
			`* needs to apply delta.`
			`*`
			`* In the worst case scenario, parent node is no longer deflated because`
			`* we're running out of delta_base_cache_limit; we need to re-deflate`
			`* parents, possibly up to the top base.`
			`*`
			`* All deflated objects here are subject to be freed if we exceed`
			`* delta_base_cache_limit, just like in find_unresolved_deltas(), we`
			`* just need to make sure the last node is not freed.`
			`*/`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`static void get_base_data(struct base_data c)`
			`{`
			`if (!c->data) {`
			`struct object_entry *obj = c->obj;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`struct base_data **delta = NULL;`
			`int delta_nr = 0, delta_alloc = 0;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`while (is_delta_type(c->obj->type) && !c->data) {`
			`ALLOC_GROW(delta, delta_nr + 1, delta_alloc);`
			`delta[delta_nr++] = c;`
			`c = c->base;`
			`}`
			`if (!delta_nr) {`
			`c->data = get_data_from_pack(obj);`
			`c->size = obj->size;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`prune_base_data(c);`
			`}`
			`for (; delta_nr > 0; delta_nr--) {`
			`void base, raw;`
			`c = delta[delta_nr - 1];`
			`obj = c->obj;`
			`base = get_base_data(c->base);`
			`raw = get_data_from_pack(obj);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`c->data = patch_delta(`
			`base, c->base->size,`
			`raw, obj->size,`
			`&c->size);`
			`free(raw);`
			`if (!c->data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("failed to apply delta"));`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`prune_base_data(c);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`}`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`free(delta);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`}`
			`return c->data;`
			`}`

index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`static void resolve_delta(struct object_entry *delta_obj,`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`struct base_data base, struct base_data result)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`void base_data, delta_data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`delta_obj->real_type = base->obj->real_type;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`delta_obj->delta_depth = base->obj->delta_depth + 1;`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`if (deepest_delta < delta_obj->delta_depth)`
			`deepest_delta = delta_obj->delta_depth;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`delta_obj->base_object_no = base->obj - objects;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`delta_data = get_data_from_pack(delta_obj);`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`base_data = get_base_data(base);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`result->obj = delta_obj;`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`result->data = patch_delta(base_data, base->size,`
			`delta_data, delta_obj->size, &result->size);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(delta_data);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`if (!result->data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(delta_obj->idx.offset, _("failed to apply delta"));`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`hash_sha1_file(result->data, result->size,`
			`typename(delta_obj->real_type), delta_obj->idx.sha1);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`sha1_object(result->data, NULL, result->size, delta_obj->real_type,`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`delta_obj->idx.sha1);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`counter_lock();`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`nr_resolved_deltas++;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`counter_unlock();`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`}`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`static struct base_data find_unresolved_deltas_1(struct base_data base,`
			`struct base_data *prev_base)`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`{`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_last == -1 && base->ofs_last == -1) {`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`union delta_base base_spec;`

			`hashcpy(base_spec.sha1, base->obj->idx.sha1);`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`find_delta_children(&base_spec,`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`&base->ref_first, &base->ref_last, OBJ_REF_DELTA);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`memset(&base_spec, 0, sizeof(base_spec));`
			`base_spec.offset = base->obj->idx.offset;`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`find_delta_children(&base_spec,`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`&base->ofs_first, &base->ofs_last, OBJ_OFS_DELTA);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_last == -1 && base->ofs_last == -1) {`
			`free(base->data);`
			`return NULL;`
			`}`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`link_base_data(prev_base, base);`
			`}`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_first <= base->ref_last) {`
			`struct object_entry *child = objects + deltas[base->ref_first].obj_no;`
			`struct base_data *result = alloc_base_data();`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`assert(child->real_type == OBJ_REF_DELTA);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`resolve_delta(child, base, result);`
			`if (base->ref_first == base->ref_last && base->ofs_last == -1)`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`free_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00
			`base->ref_first++;`
			`return result;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`}`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ofs_first <= base->ofs_last) {`
			`struct object_entry *child = objects + deltas[base->ofs_first].obj_no;`
			`struct base_data *result = alloc_base_data();`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`assert(child->real_type == OBJ_OFS_DELTA);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`resolve_delta(child, base, result);`
			`if (base->ofs_first == base->ofs_last)`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`free_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00
			`base->ofs_first++;`
			`return result;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`unlink_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`return NULL;`
			`}`

			`static void find_unresolved_deltas(struct base_data *base)`
			`{`
			`struct base_data new_base, prev_base = NULL;`
			`for (;;) {`
			`new_base = find_unresolved_deltas_1(base, prev_base);`

			`if (new_base) {`
			`prev_base = base;`
			`base = new_base;`
			`} else {`
			`free(base);`
			`base = prev_base;`
			`if (!base)`
			`return;`
			`prev_base = base->base;`
			`}`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

			`static int compare_delta_entry(const void a, const void b)`
			`{`
			`const struct delta_entry *delta_a = a;`
			`const struct delta_entry *delta_b = b;`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`/* group by type (ref vs ofs) and then by value (sha-1 or offset) */`
			`return compare_delta_bases(&delta_a->base, &delta_b->base,`
			`objects[delta_a->obj_no].type,`
			`objects[delta_b->obj_no].type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`static void resolve_base(struct object_entry *obj)`
			`{`
			`struct base_data *base_obj = alloc_base_data();`
			`base_obj->obj = obj;`
			`base_obj->data = NULL;`
			`find_unresolved_deltas(base_obj);`
			`}`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`
			`static void threaded_second_pass(void data)`
			`{`
			`set_thread_data(data);`
			`for (;;) {`
			`int i;`
			`work_lock();`
			`display_progress(progress, nr_resolved_deltas);`
			`while (nr_dispatched < nr_objects &&`
			`is_delta_type(objects[nr_dispatched].type))`
			`nr_dispatched++;`
			`if (nr_dispatched >= nr_objects) {`
			`work_unlock();`
			`break;`
			`}`
			`i = nr_dispatched++;`
			`work_unlock();`

			`resolve_base(&objects[i]);`
			`}`
			`return NULL;`
			`}`
			`#endif`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`/*`
			`* First pass:`
			`* - find locations of all objects;`
			`* - calculate SHA1 of all non-delta objects;`
			`* - remember base (SHA1 or offset) for all deltas.`
			`*/`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`static void parse_pack_objects(unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`int i, nr_delays = 0;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`struct delta_entry *delta = deltas;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`struct stat st;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
make progress "title" part of the common progress interface If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 20:10:07 +02:00			`if (verbose)`
add throughput display to index-pack ... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:35 +01:00			`progress = start_progress(`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`from_stdin ? _("Receiving objects") : _("Indexing objects"),`
add throughput display to index-pack ... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:35 +01:00			`nr_objects);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`void *data = unpack_raw_entry(obj, &delta->base, obj->idx.sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`obj->real_type = obj->type;`
index-pack: a miniscule refactor Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:14 +02:00			`if (is_delta_type(obj->type)) {`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`nr_deltas++;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`delta->obj_no = i;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`delta++;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`} else if (!data) {`
			`/* large blobs, check later */`
			`obj->real_type = OBJ_BAD;`
			`nr_delays++;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`} else`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`sha1_object(data, NULL, obj->size, obj->type, obj->idx.sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(data);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, i+1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`objects[i].idx.offset = consumed_bytes;`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`stop_progress(&progress);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`/* Check pack integrity */`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`flush();`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Final(sha1, &input_ctx);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`if (hashcmp(fill(20), sha1))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack is corrupted (SHA1 mismatch)"));`
mimic unpack-objects when --stdin is used with index-pack It appears that git-unpack-objects writes the last part of the input buffer to stdout after the pack has been parsed. This looks a bit suspicious since the last fill() might have filled the buffer up to the 4096 byte limit and more data might still be pending on stdin, but since this is about being a drop-in replacement for unpack-objects let's simply duplicate the same behavior for now. [jc: with fix-up appeared in Nico's sleep] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:31:53 +02:00			`use(20);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`/* If input_fd is a file, we should have reached its end now. */`
			`if (fstat(input_fd, &st))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot fstat packfile"));`
git-bundle: assorted fixes This patch fixes issues mentioned by Junio, Nico and Simon: - I forgot to convert the usage string when removing the "--" from the subcommands, - a style fix in the bundle_header, - use xread() instead of read(), - use write_or_die() instead of write(), - make the bundle header extensible, - fail if the whitespace after a sha1 of a reference is missing, - close() the fds passed to a subprocess, - in verify_bundle(), do not use "rev-list --stdin", but rather pass the revs directly (avoiding a fork()), - fix a corrupted comment in show_object(), and - fix the size check in index_pack. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-22 19:14:14 +01:00			`if (S_ISREG(st.st_mode) &&`
			`lseek(input_fd, 0, SEEK_CUR) - input_len != st.st_size)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack has junk at the end"));`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`
			`if (obj->real_type != OBJ_BAD)`
			`continue;`
			`obj->real_type = obj->type;`
			`sha1_object(NULL, obj, obj->size, obj->type, obj->idx.sha1);`
			`nr_delays--;`
			`}`
			`if (nr_delays)`
			`die(_("confusion beyond insanity in parse_pack_objects()"));`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`}`

			`/*`
			`* Second pass:`
			`* - for all non-delta objects, look if it is used as a base for`
			`* deltas;`
			`* - if used as a base, uncompress the object and apply all deltas,`
			`* recursively checking if the resulting object is used as a base`
			`* for some more deltas.`
			`*/`
			`static void resolve_deltas(void)`
			`{`
			`int i;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`if (!nr_deltas)`
			`return;`

teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`/* Sort deltas by base SHA1/offset for fast searching */`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`qsort(deltas, nr_deltas, sizeof(struct delta_entry),`
			`compare_delta_entry);`

make progress "title" part of the common progress interface If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 20:10:07 +02:00			`if (verbose)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`progress = start_progress(_("Resolving deltas"), nr_deltas);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00
			`#ifndef NO_PTHREADS`
			`nr_dispatched = 0;`
			`if (nr_threads > 1 \|\| getenv("GIT_FORCE_THREADS")) {`
			`init_thread();`
			`for (i = 0; i < nr_threads; i++) {`
			`int ret = pthread_create(&thread_data[i].thread, NULL,`
			`threaded_second_pass, thread_data + i);`
			`if (ret)`
			`die("unable to create thread: %s", strerror(ret));`
			`}`
			`for (i = 0; i < nr_threads; i++)`
			`pthread_join(thread_data[i].thread, NULL);`
			`cleanup_thread();`
			`return;`
			`}`
			`#endif`

Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`

index-pack: a miniscule refactor Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:14 +02:00			`if (is_delta_type(obj->type))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`continue;`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`resolve_base(obj);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, nr_resolved_deltas);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`/*`
			`* Third pass:`
			`* - append objects to convert thin pack to full pack if required`
			`* - write the final 20-byte SHA-1`
			`*/`
			`static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved);`
			`static void conclude_pack(int fix_thin_pack, const char curr_pack, unsigned char pack_sha1)`
			`{`
			`if (nr_deltas == nr_resolved_deltas) {`
			`stop_progress(&progress);`
			`/* Flush remaining pack final 20-byte SHA1. */`
			`flush();`
			`return;`
			`}`

			`if (fix_thin_pack) {`
			`struct sha1file *f;`
			`unsigned char read_sha1[20], tail_sha1[20];`
			`char msg[48];`
			`int nr_unresolved = nr_deltas - nr_resolved_deltas;`
			`int nr_objects_initial = nr_objects;`
			`if (nr_unresolved <= 0)`
Merge branch 'nd/threaded-index-pack' Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation 2012-05-14 20:50:40 +02:00			`die(_("confusion beyond insanity"));`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`objects = xrealloc(objects,`
			`(nr_objects + nr_unresolved + 1)`
			`* sizeof(*objects));`
			`f = sha1fd(output_fd, curr_pack);`
			`fix_unresolved_deltas(f, nr_unresolved);`
			`sprintf(msg, "completed with %d local objects",`
			`nr_objects - nr_objects_initial);`
			`stop_progress_msg(&progress, msg);`
			`sha1close(f, tail_sha1, 0);`
			`hashcpy(read_sha1, pack_sha1);`
			`fixup_pack_header_footer(output_fd, pack_sha1,`
			`curr_pack, nr_objects,`
			`read_sha1, consumed_bytes-20);`
			`if (hashcmp(read_sha1, tail_sha1) != 0)`
			`die("Unexpected tail checksum for %s "`
			`"(disk corruption?)", curr_pack);`
			`}`
			`if (nr_deltas != nr_resolved_deltas)`
Merge branch 'nd/threaded-index-pack' Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation 2012-05-14 20:50:40 +02:00			`die(Q_("pack has %d unresolved delta",`
			`"pack has %d unresolved deltas",`
			`nr_deltas - nr_resolved_deltas),`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`nr_deltas - nr_resolved_deltas);`
			`}`

index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static int write_compressed(struct sha1file f, void in, unsigned int size)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`int status;`
			`unsigned char outbuf[4096];`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`memset(&stream, 0, sizeof(stream));`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`git_deflate_init(&stream, zlib_compression_level);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`stream.next_in = in;`
			`stream.avail_in = size;`

index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`do {`
			`stream.next_out = outbuf;`
			`stream.avail_out = sizeof(outbuf);`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`status = git_deflate(&stream, Z_FINISH);`
index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`sha1write(f, outbuf, sizeof(outbuf) - stream.avail_out);`
			`} while (status == Z_OK);`

			`if (status != Z_STREAM_END)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("unable to deflate appended object (%d)"), status);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`size = stream.total_out;`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`git_deflate_end(&stream);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`return size;`
			`}`

index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static struct object_entry append_obj_to_pack(struct sha1file f,`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`const unsigned char sha1, void buf,`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`unsigned long size, enum object_type type)`
			`{`
			`struct object_entry *obj = &objects[nr_objects++];`
			`unsigned char header[10];`
			`unsigned long s = size;`
			`int n = 0;`
			`unsigned char c = (type << 4) \| (s & 15);`
			`s >>= 4;`
			`while (s) {`
			`header[n++] = c \| 0x80;`
			`c = s & 0x7f;`
			`s >>= 7;`
			`}`
			`header[n++] = c;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`crc32_begin(f);`
			`sha1write(f, header, n);`
index-pack.c: correctly initialize appended objects When index-pack completes a thin pack it appends objects to the pack. Since the commit 92392b4(index-pack: Honor core.deltaBaseCacheLimit when resolving deltas) such an object can be pruned in case of memory pressure, and will be read back again by get_data_from_pack(). For this to work, the fields in object_entry structure need to be initialized properly. Noticed by Pierre Habouzit. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Acked-by: Nicolas Pitre <nico@cam.org> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-24 19:32:00 +02:00			`obj[0].size = size;`
			`obj[0].hdr_size = n;`
			`obj[0].type = type;`
			`obj[0].real_type = type;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj[1].idx.offset = obj[0].idx.offset + n;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`obj[1].idx.offset += write_compressed(f, buf, size);`
			`obj[0].idx.crc32 = crc32_end(f);`
fix pread()'s short read in index-pack Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for its ability to compute a SHA1 on the written data. This also provides data buffering which, along with commit 92392b4a45, will confuse pread() whenever an appended object is 1) freed due to memory pressure because of the depth-first delta processing, and 2) needed again because it has many delta children, and 3) its data is still buffered by sha1write(). Let's fix the issue by simply forcing cached data out when such an object is written so it can be pread()'d at leisure. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-10 04:08:51 +02:00			`sha1flush(f);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`hashcpy(obj->idx.sha1, sha1);`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`return obj;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`

			`static int delta_pos_compare(const void _a, const void _b)`
			`{`
			`struct delta_entry a = (struct delta_entry **)_a;`
			`struct delta_entry b = (struct delta_entry **)_b;`
			`return a->obj_no - b->obj_no;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
			`struct delta_entry **sorted_by_pos;`
common progress display support Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-18 20:27:45 +02:00			`int i, n = 0;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`/*`
			`* Since many unresolved deltas may well be themselves base objects`
			`* for more unresolved deltas, we really want to include the`
			`* smallest number of base objects that would cover as much delta`
			`* as possible by picking the`
			`* trunc deltas first, allowing for other deltas to resolve without`
			`* additional base objects. Since most base objects are to be found`
			`* before deltas depending on them, a good heuristic is to start`
			`* resolving deltas in the same order as their position in the pack.`
			`*/`
			`sorted_by_pos = xmalloc(nr_unresolved * sizeof(*sorted_by_pos));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_deltas; i++) {`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`if (objects[deltas[i].obj_no].real_type != OBJ_REF_DELTA)`
			`continue;`
			`sorted_by_pos[n++] = &deltas[i];`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`qsort(sorted_by_pos, n, sizeof(*sorted_by_pos), delta_pos_compare);`

			`for (i = 0; i < n; i++) {`
			`struct delta_entry *d = sorted_by_pos[i];`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`struct base_data *base_obj = alloc_base_data();`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`if (objects[d->obj_no].real_type != OBJ_REF_DELTA)`
			`continue;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`base_obj->data = read_sha1_file(d->base.sha1, &type, &base_obj->size);`
			`if (!base_obj->data)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`continue;`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (check_sha1_signature(d->base.sha1, base_obj->data,`
			`base_obj->size, typename(type)))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("local object %s is corrupt"), sha1_to_hex(d->base.sha1));`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`base_obj->obj = append_obj_to_pack(f, d->base.sha1,`
			`base_obj->data, base_obj->size, type);`
			`find_unresolved_deltas(base_obj);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, nr_resolved_deltas);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`
			`free(sorted_by_pos);`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`static void final(const char final_pack_name, const char curr_pack_name,`
			`const char final_index_name, const char curr_index_name,`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`const char keep_name, const char keep_msg,`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`unsigned char *sha1)`
			`{`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *report = "pack";`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`char name[PATH_MAX];`
			`int err;`

			`if (!from_stdin) {`
			`close(input_fd);`
			`} else {`
Make pack creation always fsync() the result This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 17:42:16 +02:00			`fsync_or_die(output_fd, curr_pack_name);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`err = close(output_fd);`
			`if (err)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("error while closing pack file"));`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`}`

Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`if (keep_msg) {`
			`int keep_fd, keep_msg_len = strlen(keep_msg);`
Make sure objects/pack exists before creating a new pack In a repository created with git older than f49fb35 (git-init-db: create "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is not created upon initialization. It was Ok because subdirectories are created as needed inside directories init-db creates, and back then, packfiles were recent invention. After the said commit, new codepaths started relying on the presense of objects/pack/ directory in the repository. This was exacerbated with 8b4eb6b (Do not perform cross-directory renames when creating packs, 2008-09-22) that moved the location temporary pack files are created from objects/ directory to objects/pack/ directory, because moving temporary to the final location was done carefully with lazy leading directory creation. Many packfile related operations in such an old repository can fail mysteriously because of this. This commit introduces two helper functions to make things work better. - odb_mkstemp() is a specialized version of mkstemp() to refactor the code and teach it to create leading directories as needed; - odb_pack_keep() refactors the code to create a ".keep" file while create leading directories as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-25 08:11:29 +01:00
			`if (!keep_name)`
			`keep_fd = odb_pack_keep(name, sizeof(name), sha1);`
			`else`
			`keep_fd = open(keep_name, O_RDWR\|O_CREAT\|O_EXCL, 0600);`

have index-pack create .keep file more carefully If by chance we receive a pack which content (list of objects) matches another pack that we already have, and if that pack is marked with a .keep file, then we should not overwrite it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:24 +01:00			`if (keep_fd < 0) {`
			`if (errno != EEXIST)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot write keep file '%s'"),`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`keep_name);`
have index-pack create .keep file more carefully If by chance we receive a pack which content (list of objects) matches another pack that we already have, and if that pack is marked with a .keep file, then we should not overwrite it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:24 +01:00			`} else {`
			`if (keep_msg_len > 0) {`
			`write_or_die(keep_fd, keep_msg, keep_msg_len);`
			`write_or_die(keep_fd, "\n", 1);`
			`}`
detect close failure on just-written file handles I audited git for potential undetected write failures. In the cases fixed below, the diagnostics I add mimic the diagnostics used in surrounding code, even when that means not reporting the precise strerror(errno) cause of the error. Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-24 21:20:41 +02:00			`if (close(keep_fd) != 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot close written keep file '%s'"),`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`keep_name);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00			`report = "keep";`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`}`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (final_pack_name != curr_pack_name) {`
			`if (!final_pack_name) {`
			`snprintf(name, sizeof(name), "%s/pack/pack-%s.pack",`
			`get_object_directory(), sha1_to_hex(sha1));`
			`final_pack_name = name;`
			`}`
			`if (move_temp_to_file(curr_pack_name, final_pack_name))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot store pack file"));`
Move chmod(foo, 0444) into move_temp_to_file() When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-26 16:16:47 +01:00			`} else if (from_stdin)`
Do not rename read-only files during a push Win32 does not allow renaming read-only files (at least on a Samba share), making push into a local directory to fail. Thus, defer the chmod() call in index-pack.c:final() only after move_temp_to_file() was called. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-03 12:20:43 +02:00			`chmod(final_pack_name, 0444);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00
			`if (final_index_name != curr_index_name) {`
			`if (!final_index_name) {`
			`snprintf(name, sizeof(name), "%s/pack/pack-%s.idx",`
			`get_object_directory(), sha1_to_hex(sha1));`
			`final_index_name = name;`
			`}`
			`if (move_temp_to_file(curr_index_name, final_index_name))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot store index file"));`
Move chmod(foo, 0444) into move_temp_to_file() When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-26 16:16:47 +01:00			`} else`
			`chmod(final_index_name, 0444);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00
			`if (!from_stdin) {`
			`printf("%s\n", sha1_to_hex(sha1));`
			`} else {`
			`char buf[48];`
			`int len = snprintf(buf, sizeof(buf), "%s\t%s\n",`
			`report, sha1_to_hex(sha1));`
index-pack: write-or-die instead of unchecked write-in-full. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-11 22:15:51 +01:00			`write_or_die(1, buf, len);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00
			`/*`
			`* Let's just mimic git-unpack-objects here and write`
			`* the last part of the input buffer to stdout.`
			`*/`
			`while (input_len) {`
			`err = xwrite(1, input_buffer + input_offset, input_len);`
			`if (err <= 0)`
			`break;`
			`input_len -= err;`
			`input_offset += err;`
			`}`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`static int git_index_pack_config(const char k, const char v, void *cb)`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`{`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`struct pack_idx_option *opts = cb;`

make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`if (!strcmp(k, "pack.indexversion")) {`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts->version = git_config_int(k, v);`
			`if (opts->version > 2)`
			`die("bad pack.indexversion=%"PRIu32, opts->version);`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`return 0;`
			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`if (!strcmp(k, "pack.threads")) {`
			`nr_threads = git_config_int(k, v);`
			`if (nr_threads < 0)`
			`die("invalid number of threads specified (%d)",`
			`nr_threads);`
			`#ifdef NO_PTHREADS`
			`if (nr_threads != 1)`
			`warning("no threads support, ignoring %s", k);`
			`nr_threads = 1;`
			`#endif`
			`return 0;`
			`}`
Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`return git_default_config(k, v, cb);`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`}`

index-pack --verify: read anomalous offsets from v2 idx file A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 01:55:26 +01:00			`static int cmp_uint32(const void a_, const void b_)`
			`{`
			`uint32_t a = ((uint32_t )a_);`
			`uint32_t b = ((uint32_t )b_);`

			`return (a < b) ? -1 : (a != b);`
			`}`

			`static void read_v2_anomalous_offsets(struct packed_git *p,`
			`struct pack_idx_option *opts)`
			`{`
			`const uint32_t idx1, idx2;`
			`uint32_t i;`

			`/* The address of the 4-byte offset table */`
			`idx1 = (((const uint32_t *)p->index_data)`
			`+ 2 /* 8-byte header */`
			`+ 256 /* fan out */`
			`+ 5 * p->num_objects /* 20-byte SHA-1 table */`
			`+ p->num_objects /* CRC32 table */`
			`);`

			`/* The address of the 8-byte offset table */`
			`idx2 = idx1 + p->num_objects;`

			`for (i = 0; i < p->num_objects; i++) {`
			`uint32_t off = ntohl(idx1[i]);`
			`if (!(off & 0x80000000))`
			`continue;`
			`off = off & 0x7fffffff;`
			`if (idx2[off * 2])`
			`continue;`
			`/*`
			`* The real offset is ntohl(idx2[off * 2]) in high 4`
			`* octets, and ntohl(idx2[off * 2 + 1]) in low 4`
			`* octets. But idx2[off * 2] is Zero!!!`
			`*/`
			`ALLOC_GROW(opts->anomaly, opts->anomaly_nr + 1, opts->anomaly_alloc);`
			`opts->anomaly[opts->anomaly_nr++] = ntohl(idx2[off * 2 + 1]);`
			`}`

			`if (1 < opts->anomaly_nr)`
			`qsort(opts->anomaly, opts->anomaly_nr, sizeof(uint32_t), cmp_uint32);`
			`}`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`static void read_idx_option(struct pack_idx_option opts, const char pack_name)`
			`{`
			`struct packed_git *p = add_packed_git(pack_name, strlen(pack_name), 1);`

			`if (!p)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot open existing pack file '%s'"), pack_name);`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (open_pack_index(p))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot open existing pack idx file for '%s'"), pack_name);`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00
			`/* Read the attributes from the existing idx file */`
			`opts->version = p->index_version;`

index-pack --verify: read anomalous offsets from v2 idx file A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 01:55:26 +01:00			`if (opts->version == 2)`
			`read_v2_anomalous_offsets(p, opts);`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`/*`
			`* Get rid of the idx file as we do not need it anymore.`
			`* NEEDSWORK: extract this bit from free_pack_by_name() in`
			`* sha1_file.c, perhaps? It shouldn't matter very much as we`
			`* know we haven't installed this pack (hence we never have`
			`* read anything from it).`
			`*/`
			`close_pack_index(p);`
			`free(p);`
			`}`

index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`static void show_pack_info(int stat_only)`
			`{`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`int i, baseobjects = nr_objects - nr_deltas;`
			`unsigned long *chain_histogram = NULL;`

			`if (deepest_delta)`
			`chain_histogram = xcalloc(deepest_delta, sizeof(unsigned long));`

index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`

index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`if (is_delta_type(obj->type))`
			`chain_histogram[obj->delta_depth - 1]++;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`if (stat_only)`
			`continue;`
			`printf("%s %-6s %lu %lu %"PRIuMAX,`
			`sha1_to_hex(obj->idx.sha1),`
			`typename(obj->real_type), obj->size,`
			`(unsigned long)(obj[1].idx.offset - obj->idx.offset),`
			`(uintmax_t)obj->idx.offset);`
			`if (is_delta_type(obj->type)) {`
			`struct object_entry *bobj = &objects[obj->base_object_no];`
			`printf(" %u %s", obj->delta_depth, sha1_to_hex(bobj->idx.sha1));`
			`}`
			`putchar('\n');`
			`}`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00
			`if (baseobjects)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`printf_ln(Q_("non delta: %d object",`
			`"non delta: %d objects",`
			`baseobjects),`
			`baseobjects);`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`for (i = 0; i < deepest_delta; i++) {`
			`if (!chain_histogram[i])`
			`continue;`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`printf_ln(Q_("chain length = %d: %lu object",`
			`"chain length = %d: %lu objects",`
			`chain_histogram[i]),`
			`i + 1,`
			`chain_histogram[i]);`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`}`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`}`

make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`int cmd_index_pack(int argc, const char *argv, const char prefix)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`int i, fix_thin_pack = 0, verify = 0, stat_only = 0, stat = 0;`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`const char curr_pack, curr_index;`
			`const char index_name = NULL, pack_name = NULL;`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`const char keep_name = NULL, keep_msg = NULL;`
			`char index_name_buf = NULL, keep_name_buf = NULL;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`struct pack_idx_entry **idx_objects;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`struct pack_idx_option opts;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`unsigned char pack_sha1[20];`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Let 'git <command> -h' show usage without a git dir There is no need for "git <command> -h" to depend on being inside a repository. Reported by Gerfried Fuchs through http://bugs.debian.org/462557 Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-11-09 16:05:01 +01:00			`if (argc == 2 && !strcmp(argv[1], "-h"))`
			`usage(index_pack_usage);`

index-pack: Don't follow replace refs. Without this, attempting to index a pack containing objects that have been replaced results in a fatal error that looks like: fatal: SHA1 COLLISION FOUND WITH <replaced-object> ! Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Acked-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-12 16:18:12 +02:00			`read_replace_refs = 0;`

write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`reset_pack_idx_option(&opts);`
			`git_config(git_index_pack_config, &opts);`
Revert "rehabilitate 'git index-pack' inside the object store" Now setup_git_directory_gently behaves sanely even from subdirs of .git, so simplify index-pack by no longer protecting against that. This reverts commit a672ea6ac5a1b876bc7adfe6534b16fa2a32c94b (excluding tests). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-24 13:30:49 +02:00			`if (prefix && chdir(prefix))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot come back to cwd"));`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 1; i < argc; i++) {`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`const char *arg = argv[i];`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`if (*arg == '-') {`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!strcmp(arg, "--stdin")) {`
			`from_stdin = 1;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`} else if (!strcmp(arg, "--fix-thin")) {`
			`fix_thin_pack = 1;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`} else if (!strcmp(arg, "--strict")) {`
			`strict = 1;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`} else if (!strcmp(arg, "--verify")) {`
			`verify = 1;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`} else if (!strcmp(arg, "--verify-stat")) {`
			`verify = 1;`
			`stat = 1;`
			`} else if (!strcmp(arg, "--verify-stat-only")) {`
			`verify = 1;`
			`stat = 1;`
			`stat_only = 1;`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`} else if (!strcmp(arg, "--keep")) {`
			`keep_msg = "";`
Mechanical conversion to use prefixcmp() This mechanically converts strncmp() to use prefixcmp(), but only when the parameters match specific patterns, so that they can be verified easily. Leftover from this will be fixed in a separate step, including idiotic conversions like if (!strncmp("foo", arg, 3)) => if (!(-prefixcmp(arg, "foo"))) This was done by using this script in px.perl #!/usr/bin/perl -i.bak -p if (/strncmp\(([^,]+), "([^\\"])", (\d+)\)/ && (length($2) == $3)) { s\|strncmp\(([^,]+), "([^\\"])", (\d+)\)\|prefixcmp($1, "$2")\|; } if (/strncmp\("([^\\"])", ([^,]+), (\d+)\)/ && (length($1) == $3)) { s\|strncmp\("([^\\"])", ([^,]+), (\d+)\)\|(-prefixcmp($2, "$1"))\|; } and running: $ git grep -l strncmp -- '*.c' \| xargs perl px.perl Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:53:29 +01:00			`} else if (!prefixcmp(arg, "--keep=")) {`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`keep_msg = arg + 7;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`} else if (!prefixcmp(arg, "--threads=")) {`
			`char *end;`
			`nr_threads = strtoul(arg+10, &end, 0);`
			`if (!arg[10] \|\| *end \|\| nr_threads < 0)`
			`usage(index_pack_usage);`
			`#ifdef NO_PTHREADS`
			`if (nr_threads != 1)`
			`warning("no threads support, "`
			`"ignoring %s", arg);`
			`nr_threads = 1;`
			`#endif`
Mechanical conversion to use prefixcmp() This mechanically converts strncmp() to use prefixcmp(), but only when the parameters match specific patterns, so that they can be verified easily. Leftover from this will be fixed in a separate step, including idiotic conversions like if (!strncmp("foo", arg, 3)) => if (!(-prefixcmp(arg, "foo"))) This was done by using this script in px.perl #!/usr/bin/perl -i.bak -p if (/strncmp\(([^,]+), "([^\\"])", (\d+)\)/ && (length($2) == $3)) { s\|strncmp\(([^,]+), "([^\\"])", (\d+)\)\|prefixcmp($1, "$2")\|; } if (/strncmp\("([^\\"])", ([^,]+), (\d+)\)/ && (length($1) == $3)) { s\|strncmp\("([^\\"])", ([^,]+), (\d+)\)\|(-prefixcmp($2, "$1"))\|; } and running: $ git grep -l strncmp -- '*.c' \| xargs perl px.perl Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:53:29 +01:00			`} else if (!prefixcmp(arg, "--pack_header=")) {`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`struct pack_header *hdr;`
			`char *c;`

			`hdr = (struct pack_header *)input_buffer;`
			`hdr->hdr_signature = htonl(PACK_SIGNATURE);`
			`hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));`
			`if (*c != ',')`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));`
			`if (*c)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`input_len = sizeof(*hdr);`
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`} else if (!strcmp(arg, "-v")) {`
			`verbose = 1;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else if (!strcmp(arg, "-o")) {`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`if (index_name \|\| (i+1) >= argc)`
			`usage(index_pack_usage);`
			`index_name = argv[++i];`
allow forcing index v2 and 64-bit offset treshold This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 23:32:03 +02:00			`} else if (!prefixcmp(arg, "--index-version=")) {`
			`char *c;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts.version = strtoul(arg + 16, &c, 10);`
			`if (opts.version > 2)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
allow forcing index v2 and 64-bit offset treshold This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 23:32:03 +02:00			`if (*c == ',')`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts.off32_limit = strtoul(c+1, &c, 0);`
			`if (*c \|\| opts.off32_limit & 0x80000000)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`} else`
			`usage(index_pack_usage);`
			`continue;`
			`}`

			`if (pack_name)`
			`usage(index_pack_usage);`
			`pack_name = arg;`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!pack_name && !from_stdin)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`usage(index_pack_usage);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`if (fix_thin_pack && !from_stdin)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("--fix-thin cannot be used without --stdin"));`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!index_name && pack_name) {`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`int len = strlen(pack_name);`
drop length argument of has_extension As Fredrik points out the current interface of has_extension() is potentially confusing. Its parameters include both a nul-terminated string and a length-limited string. This patch drops the length argument, requiring two nul-terminated strings; all callsites are updated. I checked that all of them indeed provide nul-terminated strings. Filenames need to be nul-terminated anyway if they are to be passed to open() etc. The performance penalty of the additional strlen() is negligible compared to the system calls which inevitably surround has_extension() calls. Additionally, change has_extension() to use size_t inside instead of int, as that is the exact type strlen() returns and memcmp() expects. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-11 14:01:45 +02:00			`if (!has_extension(pack_name, ".pack"))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("packfile name '%s' does not end with '.pack'"),`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`pack_name);`
An off-by-one bug found by valgrind Insufficient memory is allocated in index-pack.c to hold the *.idx name. One more byte should be allocated to hold the terminating 0. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-21 21:35:48 +01:00			`index_name_buf = xmalloc(len);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`memcpy(index_name_buf, pack_name, len - 5);`
			`strcpy(index_name_buf + len - 5, ".idx");`
			`index_name = index_name_buf;`
			`}`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`if (keep_msg && !keep_name && pack_name) {`
			`int len = strlen(pack_name);`
			`if (!has_extension(pack_name, ".pack"))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("packfile name '%s' does not end with '.pack'"),`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`pack_name);`
			`keep_name_buf = xmalloc(len);`
			`memcpy(keep_name_buf, pack_name, len - 5);`
			`strcpy(keep_name_buf + len - 5, ".keep");`
			`keep_name = keep_name_buf;`
			`}`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (verify) {`
			`if (!index_name)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("--verify with no packfile name given"));`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`read_idx_option(&opts, index_name);`
receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-17 07:04:13 +01:00			`opts.flags \|= WRITE_IDX_VERIFY \| WRITE_IDX_STRICT;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`}`
receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-17 07:04:13 +01:00			`if (strict)`
			`opts.flags \|= WRITE_IDX_STRICT;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`
			`if (!nr_threads) {`
			`nr_threads = online_cpus();`
			`/* An experiment showed that more threads does not mean faster */`
			`if (nr_threads > 3)`
			`nr_threads = 3;`
			`}`
			`#endif`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`curr_pack = open_pack_file(pack_name);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`parse_pack_header();`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));`
			`deltas = xcalloc(nr_objects, sizeof(struct delta_entry));`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`parse_pack_objects(pack_sha1);`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`resolve_deltas();`
			`conclude_pack(fix_thin_pack, curr_pack, pack_sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(deltas);`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (strict)`
			`check_objects();`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`if (stat)`
			`show_pack_info(stat_only);`

Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));`
			`for (i = 0; i < nr_objects; i++)`
			`idx_objects[i] = &objects[i].idx;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_sha1);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`free(idx_objects);`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (!verify)`
			`final(pack_name, curr_pack,`
			`index_name, curr_index,`
			`keep_name, keep_msg,`
			`pack_sha1);`
			`else`
			`close(input_fd);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(objects);`
			`free(index_name_buf);`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`free(keep_name_buf);`
fix for more minor memory leaks Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:50 +02:00			`if (pack_name == NULL)`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`free((void *) curr_pack);`
fix for more minor memory leaks Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:50 +02:00			`if (index_name == NULL)`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`free((void *) curr_index);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`return 0;`
			`}`