mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00

1685 lines

43 KiB

C

Raw Normal View History

sparse: Fix an "symbol 'cmd_index_pack' not declared" warning Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-07 20:23:40 +02:00			`#include "builtin.h"`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`#include "delta.h"`
			`#include "pack.h"`
			`#include "csum-file.h"`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`#include "blob.h"`
			`#include "commit.h"`
			`#include "tag.h"`
			`#include "tree.h"`
common progress display support Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-18 20:27:45 +02:00			`#include "progress.h"`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`#include "fsck.h"`
Add calls to git_extract_argv0_path() in programs that call git_config_* Programs that use git_config need to find the global configuration. When runtime prefix computation is enabled, this requires that git_extract_argv0_path() is called early in the program's main(). This commit adds the necessary calls. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-18 13:00:12 +01:00			`#include "exec_cmd.h"`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`#include "streaming.h"`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#include "thread-utils.h"`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`static const char index_pack_usage[] =`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`"git index-pack [-v] [-o <index-file>] [--keep \| --keep=<msg>] [--verify] [--strict] (<pack-file> \| --stdin [--fix-thin] [<pack-file>])";`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
standardize brace placement in struct definitions In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char baz; }; Indeed, grepping for 'struct [a-z_] {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-16 08:08:34 +01:00			`struct object_entry {`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`struct pack_idx_entry idx;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`unsigned long size;`
			`unsigned int hdr_size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`enum object_type type;`
			`enum object_type real_type;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`unsigned delta_depth;`
			`int base_object_no;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`};`

teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`union delta_base {`
			`unsigned char sha1[20];`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`off_t offset;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`};`

index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`struct base_data {`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`struct base_data *base;`
			`struct base_data *child;`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`struct object_entry *obj;`
index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`void *data;`
			`unsigned long size;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`int ref_first, ref_last;`
			`int ofs_first, ofs_last;`
index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`};`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`struct thread_local {`
			`#ifndef NO_PTHREADS`
			`pthread_t thread;`
			`#endif`
			`struct base_data *base_cache;`
			`size_t base_cache_used;`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`int pack_fd;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`};`

index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`/*`
			`* Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want`
			`* to memcmp() only the first 20 bytes.`
			`*/`
			`#define UNION_BASE_SZ 20`

index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`#define FLAG_LINK (1u<<20)`
			`#define FLAG_CHECKED (1u<<21)`

standardize brace placement in struct definitions In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char baz; }; Indeed, grepping for 'struct [a-z_] {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-16 08:08:34 +01:00			`struct delta_entry {`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`union delta_base base;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`int obj_no;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`};`

			`static struct object_entry *objects;`
			`static struct delta_entry *deltas;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static struct thread_local nothread_data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`static int nr_objects;`
			`static int nr_deltas;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`static int nr_resolved_deltas;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static int nr_threads;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`static int from_stdin;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`static int strict;`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`static int do_fsck_object;`
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`static int verbose;`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`static int show_stat;`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`static int check_self_contained_and_connected;`
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00
make struct progress an opaque type This allows for better management of progress "object" existence, as well as making the progress display implementation more independent from its callers. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:32 +01:00			`static struct progress *progress;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`/* We always read in 4kB chunks. */`
			`static unsigned char input_buffer[4096];`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`static unsigned int input_offset, input_len;`
			`static off_t consumed_bytes;`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`static unsigned deepest_delta;`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static git_SHA_CTX input_ctx;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`static uint32_t input_crc32;`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`static int input_fd, output_fd;`
			`static const char *curr_pack;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`

			`static struct thread_local *thread_data;`
			`static int nr_dispatched;`
			`static int threads_active;`

			`static pthread_mutex_t read_mutex;`
			`#define read_lock() lock_mutex(&read_mutex)`
			`#define read_unlock() unlock_mutex(&read_mutex)`

			`static pthread_mutex_t counter_mutex;`
			`#define counter_lock() lock_mutex(&counter_mutex)`
			`#define counter_unlock() unlock_mutex(&counter_mutex)`

			`static pthread_mutex_t work_mutex;`
			`#define work_lock() lock_mutex(&work_mutex)`
			`#define work_unlock() unlock_mutex(&work_mutex)`

index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`static pthread_mutex_t deepest_delta_mutex;`
			`#define deepest_delta_lock() lock_mutex(&deepest_delta_mutex)`
			`#define deepest_delta_unlock() unlock_mutex(&deepest_delta_mutex)`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static pthread_key_t key;`

			`static inline void lock_mutex(pthread_mutex_t *mutex)`
			`{`
			`if (threads_active)`
			`pthread_mutex_lock(mutex);`
			`}`

			`static inline void unlock_mutex(pthread_mutex_t *mutex)`
			`{`
			`if (threads_active)`
			`pthread_mutex_unlock(mutex);`
			`}`

			`/*`
			`* Mutex and conditional variable can't be statically-initialized on Windows.`
			`*/`
			`static void init_thread(void)`
			`{`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`int i;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`init_recursive_mutex(&read_mutex);`
			`pthread_mutex_init(&counter_mutex, NULL);`
			`pthread_mutex_init(&work_mutex, NULL);`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`if (show_stat)`
			`pthread_mutex_init(&deepest_delta_mutex, NULL);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`pthread_key_create(&key, NULL);`
			`thread_data = xcalloc(nr_threads, sizeof(*thread_data));`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`for (i = 0; i < nr_threads; i++) {`
			`thread_data[i].pack_fd = open(curr_pack, O_RDONLY);`
			`if (thread_data[i].pack_fd == -1)`
			`die_errno(_("unable to open %s"), curr_pack);`
			`}`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`threads_active = 1;`
			`}`

			`static void cleanup_thread(void)`
			`{`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`int i;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`if (!threads_active)`
			`return;`
			`threads_active = 0;`
			`pthread_mutex_destroy(&read_mutex);`
			`pthread_mutex_destroy(&counter_mutex);`
			`pthread_mutex_destroy(&work_mutex);`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`if (show_stat)`
			`pthread_mutex_destroy(&deepest_delta_mutex);`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`for (i = 0; i < nr_threads; i++)`
			`close(thread_data[i].pack_fd);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`pthread_key_delete(key);`
			`free(thread_data);`
			`}`

			`#else`

			`#define read_lock()`
			`#define read_unlock()`

			`#define counter_lock()`
			`#define counter_unlock()`

			`#define work_lock()`
			`#define work_unlock()`

index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`#define deepest_delta_lock()`
			`#define deepest_delta_unlock()`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#endif`


index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`static int mark_link(struct object obj, int type, void data)`
			`{`
			`if (!obj)`
			`return -1;`

			`if (type != OBJ_ANY && obj->type != type)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("object type mismatch at %s"), sha1_to_hex(obj->sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`obj->flags \|= FLAG_LINK;`
			`return 0;`
			`}`

			`/* The content of each linked object must have been checked`
			`or it must be already present in the object database */`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`static unsigned check_object(struct object *obj)`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`{`
			`if (!obj)`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`return 0;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`if (!(obj->flags & FLAG_LINK))`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`return 0;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`if (!(obj->flags & FLAG_CHECKED)) {`
			`unsigned long size;`
			`int type = sha1_object_info(obj->sha1, &size);`
index-pack: distinguish missing objects from type errors When we fetch a pack that does not contain an object we expected to receive, we get an error like: $ git init --bare tmp.git && cd tmp.git $ git fetch ../parent.git [...] error: Could not read 964953ec7bcc0245cb1d0db4095455edd21a2f2e fatal: Failed to traverse parents of commit b8247b40caf6704fe52736cdece6d6aae87471aa error: ../parent.git did not send all necessary objects This comes from the check_everything_connected rev-list. If we try cloning the same repo (rather than a fetch), we end up using index-pack's --check-self-contained-and-connected option instead, which produces output like: $ git clone --no-local --bare parent.git tmp.git [...] fatal: object of unexpected type fatal: index-pack failed Not only is the sha1 missing, but it's a misleading message. There's no type problem, but rather a missing object problem; we don't notice the difference because we simply compare OBJ_BAD != OBJ_BLOB. Let's provide a different message for this case: $ git clone --no-local --bare parent.git tmp.git fatal: did not receive expected object 6b00a8c61ed379d5f925a72c1987c9c52129d364 fatal: index-pack failed While we're at it, let's also improve a true type mismatch error to look like fatal: object 6b00a8c61ed379d5f925a72c1987c9c52129d364: expected type blob, got tree Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-05-12 06:38:39 +02:00			`if (type <= 0)`
			`die(_("did not receive expected object %s"),`
			`sha1_to_hex(obj->sha1));`
			`if (type != obj->type)`
			`die(_("object %s: expected type %s, found %s"),`
			`sha1_to_hex(obj->sha1),`
			`typename(obj->type), typename(type));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`obj->flags \|= FLAG_CHECKED;`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`return 1;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00
			`return 0;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`

clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`static unsigned check_objects(void)`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`{`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`unsigned i, max, foreign_nr = 0;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`max = get_max_object_index();`
			`for (i = 0; i < max; i++)`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`foreign_nr += check_object(get_indexed_object(i));`
			`return foreign_nr;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`


make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`/* Discard current buffer used content. */`
sparse fix: non-ANSI function declaration The declaration of discard_cache() in cache.h already has its "void". Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-18 13:07:06 +01:00			`static void flush(void)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
			`if (input_offset) {`
			`if (output_fd >= 0)`
			`write_or_die(output_fd, input_buffer, input_offset);`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(&input_ctx, input_buffer, input_offset);`
Don't use memcpy when source and dest. buffers may overlap git-index-pack can call memcpy with overlapping source and destination buffers. The patch below makes it use memmove instead. If you want to demonstrate a failure, add the following two lines + if (input_offset < input_len) + abort (); before the existing memcpy call (shown in the patch below), and then run this: (cd t; sh ./t5500-fetch-pack.sh) Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-11 19:06:34 +01:00			`memmove(input_buffer, input_buffer + input_offset, input_len);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`input_offset = 0;`
			`}`
			`}`

add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`/*`
			`* Make sure at least "min" bytes are available in the buffer, and`
			`* return the pointer to the buffer.`
			`*/`
index-pack: minor fixes to comment and function name Use proper english. Be more exact in one comment. [jc: I threw in a bit of style clean-up as well] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-27 22:14:23 +02:00			`static void *fill(int min)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`if (min <= input_len)`
			`return input_buffer + input_offset;`
			`if (min > sizeof(input_buffer))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(Q_("cannot fill %d byte",`
			`"cannot fill %d bytes",`
			`min),`
			`min);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`flush();`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`do {`
Ensure return value from xread() is always stored into an ssize_t This patch fixes all calls to xread() where the return value is not stored into an ssize_t. The patch should not have any effect whatsoever, other than putting better/more appropriate type names on variables. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-15 14:49:22 +02:00			`ssize_t ret = xread(input_fd, input_buffer + input_len,`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`sizeof(input_buffer) - input_len);`
			`if (ret <= 0) {`
			`if (!ret)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("early EOF"));`
			`die_errno(_("read error on input"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`}`
			`input_len += ret;`
make display of total transferred more accurate The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:15:41 +01:00			`if (from_stdin)`
			`display_throughput(progress, consumed_bytes + input_len);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`} while (input_len < min);`
			`return input_buffer;`
			`}`

			`static void use(int bytes)`
			`{`
			`if (bytes > input_len)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("used more bytes than were available"));`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`input_crc32 = crc32(input_crc32, input_buffer + input_offset, bytes);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`input_len -= bytes;`
			`input_offset += bytes;`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00
			`/* make sure off_t is sufficiently large not to wrap */`
do not depend on signed integer overflow Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-05 09:24:10 +02:00			`if (signed_add_overflows(consumed_bytes, bytes))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack too large for current definition of off_t"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`consumed_bytes += bytes;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`static const char open_pack_file(const char pack_name)`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`{`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (from_stdin) {`
			`input_fd = 0;`
			`if (!pack_name) {`
Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-21 02:18:21 +01:00			`static char tmp_file[PATH_MAX];`
			`output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),`
Make sure objects/pack exists before creating a new pack In a repository created with git older than f49fb35 (git-init-db: create "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is not created upon initialization. It was Ok because subdirectories are created as needed inside directories init-db creates, and back then, packfiles were recent invention. After the said commit, new codepaths started relying on the presense of objects/pack/ directory in the repository. This was exacerbated with 8b4eb6b (Do not perform cross-directory renames when creating packs, 2008-09-22) that moved the location temporary pack files are created from objects/ directory to objects/pack/ directory, because moving temporary to the final location was done carefully with lazy leading directory creation. Many packfile related operations in such an old repository can fail mysteriously because of this. This commit introduces two helper functions to make things work better. - odb_mkstemp() is a specialized version of mkstemp() to refactor the code and teach it to create leading directories as needed; - odb_pack_keep() refactors the code to create a ".keep" file while create leading directories as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-25 08:11:29 +01:00			`"pack/tmp_pack_XXXXXX");`
Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-21 02:18:21 +01:00			`pack_name = xstrdup(tmp_file);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else`
			`output_fd = open(pack_name, O_CREAT\|O_EXCL\|O_RDWR, 0600);`
			`if (output_fd < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("unable to create '%s'"), pack_name);`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`nothread_data.pack_fd = output_fd;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else {`
			`input_fd = open(pack_name, O_RDONLY);`
			`if (input_fd < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot open packfile '%s'"), pack_name);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`output_fd = -1;`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`nothread_data.pack_fd = input_fd;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`}`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Init(&input_ctx);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`return pack_name;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

			`static void parse_pack_header(void)`
			`{`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`struct pack_header *hdr = fill(sizeof(struct pack_header));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`/* Header consistency check */`
			`if (hdr->hdr_signature != htonl(PACK_SIGNATURE))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack signature mismatch"));`
remove delta-against-self bit After experimenting with code to add the ability to encode a delta against part of the deltified file, it turns out that resulting packs are _bigger_ than when this ability is not used. The raw delta output might be smaller, but it doesn't compress as well using gzip with a negative net saving on average. Said bit would in fact be more useful to allow for encoding the copying of chunks larger than 64KB providing more savings with large files. This will correspond to packs version 3. While the current code still produces packs version 2, it is made future proof so pack versions 2 and 3 are accepted. Any pack version 2 are compatible with version 3 since the redefined bit was never used before. When enough time has passed, code to use that bit to produce version 3 packs could be added. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 23:50:04 +01:00			`if (!pack_version_ok(hdr->hdr_version))`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`die(_("pack version %"PRIu32" unsupported"),`
Fix some warnings (on cygwin) to allow -Werror When printing valuds of type uint32_t, we should use PRIu32, and should not assume that it is unsigned int. On 32-bit platforms, it could be defined as unsigned long. The same caution applies to ntohl(). Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-03 17:52:09 +02:00			`ntohl(hdr->hdr_version));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`nr_objects = ntohl(hdr->hdr_entries);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`use(sizeof(struct pack_header));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

increase portability of NORETURN declarations Some compilers (including at least MSVC) support NORETURN on function declarations, but only before the function-name. This patch makes it possible to define NORETURN to something meaningful for those compilers. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2009-09-30 20:05:49 +02:00			`static NORETURN void bad_object(unsigned long offset, const char *format,`
			`...) __attribute__((format (printf, 2, 3)));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Fix sparse warnings Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-22 08:51:05 +01:00			`static NORETURN void bad_object(unsigned long offset, const char *format, ...)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
			`va_list params;`
			`char buf[1024];`

			`va_start(params, format);`
			`vsnprintf(buf, sizeof(buf), format, params);`
			`va_end(params);`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack has bad object at offset %lu: %s"), offset, buf);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`static inline struct thread_local *get_thread_data(void)`
			`{`
			`#ifndef NO_PTHREADS`
			`if (threads_active)`
			`return pthread_getspecific(key);`
			`assert(!threads_active &&`
			`"This should only be reached when all threads are gone");`
			`#endif`
			`return &nothread_data;`
			`}`

			`#ifndef NO_PTHREADS`
			`static void set_thread_data(struct thread_local *data)`
			`{`
			`if (threads_active)`
			`pthread_setspecific(key, data);`
			`}`
			`#endif`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`static struct base_data *alloc_base_data(void)`
			`{`
use xcalloc() to allocate zero-initialized memory Use xcalloc() instead of xmalloc() followed by memset() to allocate and zero out memory because it's shorter and avoids duplicating the function parameters. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-07-19 15:56:26 +02:00			`struct base_data *base = xcalloc(1, sizeof(struct base_data));`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`base->ref_last = -1;`
			`base->ofs_last = -1;`
			`return base;`
			`}`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`static void free_base_data(struct base_data *c)`
			`{`
			`if (c->data) {`
			`free(c->data);`
			`c->data = NULL;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used -= c->size;`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`}`
			`}`

index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`static void prune_base_data(struct base_data *retain)`
			`{`
Fix various dead stores found by the clang static analyzer http-push.c::finish_request(): request is initialized by the for loop index-pack.c::free_base_data(): b is initialized by the for loop merge-recursive.c::process_renames(): move compare to narrower scope, and remove unused assignments to it remove unused variable renames2 xdiff/xdiffi.c::xdl_recs_cmp(): remove unused variable ec xdiff/xemit.c::xdl_emit_diff(): xche is always overwritten Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-15 22:01:20 +01:00			`struct base_data *b;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`struct thread_local *data = get_thread_data();`
			`for (b = data->base_cache;`
			`data->base_cache_used > delta_base_cache_limit && b;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`b = b->child) {`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`if (b->data && b != retain)`
			`free_base_data(b);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`}`
			`}`

index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`static void link_base_data(struct base_data base, struct base_data c)`
			`{`
			`if (base)`
			`base->child = c;`
			`else`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache = c;`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
			`c->base = base;`
			`c->child = NULL;`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`if (c->data)`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`prune_base_data(c);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`}`

			`static void unlink_base_data(struct base_data *c)`
			`{`
			`struct base_data *base = c->base;`
			`if (base)`
			`base->child = NULL;`
			`else`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache = NULL;`
index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`free_base_data(c);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00			`}`

index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`static int is_delta_type(enum object_type type)`
			`{`
			`return (type == OBJ_REF_DELTA \|\| type == OBJ_OFS_DELTA);`
			`}`

			`static void *unpack_entry_data(unsigned long offset, unsigned long size,`
			`enum object_type type, unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`static char fixed_buf[8192];`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`int status;`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`void *buf;`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`git_SHA_CTX c;`
			`char hdr[32];`
			`int hdrlen;`

			`if (!is_delta_type(type)) {`
			`hdrlen = sprintf(hdr, "%s %lu", typename(type), size) + 1;`
			`git_SHA1_Init(&c);`
			`git_SHA1_Update(&c, hdr, hdrlen);`
			`} else`
			`sha1 = NULL;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`if (type == OBJ_BLOB && size > big_file_threshold)`
			`buf = fixed_buf;`
			`else`
			`buf = xmalloc(size);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`memset(&stream, 0, sizeof(stream));`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`git_inflate_init(&stream);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`stream.next_out = buf;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`stream.avail_out = buf == fixed_buf ? sizeof(fixed_buf) : size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`do {`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`unsigned char *last_out = stream.next_out;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`stream.next_in = fill(1);`
			`stream.avail_in = input_len;`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`status = git_inflate(&stream, 0);`
			`use(input_len - stream.avail_in);`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`if (sha1)`
			`git_SHA1_Update(&c, last_out, stream.next_out - last_out);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`if (buf == fixed_buf) {`
			`stream.next_out = buf;`
			`stream.avail_out = sizeof(fixed_buf);`
			`}`
index-pack: rationalize unpack_entry_data() Rework the loop to remove duplicated calls to use() and fill(), and to make the code easier to read. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:12:06 +02:00			`} while (status == Z_OK);`
			`if (stream.total_out != size \|\| status != Z_STREAM_END)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(offset, _("inflate returned %d"), status);`
Wrap inflate and other zlib routines for better error reporting R. Tyler Ballance reported a mysterious transient repository corruption; after much digging, it turns out that we were not catching and reporting memory allocation errors from some calls we make to zlib. This one _just_ wraps things; it doesn't do the "retry on low memory error" part, at least not yet. It is an independent issue from the reporting. Some of the errors are expected and passed back to the caller, but we die when zlib reports it failed to allocate memory for now. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-08 04:54:47 +01:00			`git_inflate_end(&stream);`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`if (sha1)`
			`git_SHA1_Final(sha1, &c);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`return buf == fixed_buf ? NULL : buf;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`static void unpack_raw_entry(struct object_entry obj,`
			`union delta_base *delta_base,`
			`unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 02:22:27 +02:00			`unsigned char *p;`
			`unsigned long size, c;`
add overflow tests on pack offset variables Change a few size and offset variables to more appropriate type, then add overflow tests on those offsets. This prevents any bad data to be generated/processed if off_t happens to not be large enough to handle some big packs. Better be safe than sorry. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:30 +02:00			`off_t base_offset;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`unsigned shift;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`void *data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->idx.offset = consumed_bytes;`
sparse: Fix errors and silence warnings * load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-03 09:06:54 +02:00			`input_crc32 = crc32(0, NULL, 0);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`p = fill(1);`
			`c = *p;`
			`use(1);`
			`obj->type = (c >> 4) & 7;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`size = (c & 15);`
			`shift = 4;`
			`while (c & 0x80) {`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 02:22:27 +02:00			`size += (c & 0x7f) << shift;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`shift += 7;`
			`}`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`obj->size = size;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`switch (obj->type) {`
introduce delta objects with offset to base This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to OBJ_REF_DELTA to better make the distinction between those two delta objects, and adds support for the handling of those new delta objects in sha1_file.c only. The OBJ_OFS_DELTA contains a relative offset from the delta object's position in a pack instead of the 20-byte SHA1 reference to identify the base object. Since the base is likely to be not so far away, the relative offset is more likely to have a smaller encoding on average than an absolute offset. And for those delta objects the base must always be stored first because there is no way to know the distance of later objects when streaming a pack. Hence this relative offset is always meant to be negative. The offset encoding is slightly denser than the one used for object size -- credits to <linux@horizon.com> (whoever this is) for bringing it to my attention. This allows for pack size reduction between 3.2% (Linux-2.6) to over 5% (linux-historic). Runtime pack access should be faster too since delta replay does skip a search in the pack index for each delta in a chain. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:06:49 +02:00			`case OBJ_REF_DELTA:`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`hashcpy(delta_base->sha1, fill(20));`
			`use(20);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`break;`
			`case OBJ_OFS_DELTA:`
			`memset(delta_base, 0, sizeof(*delta_base));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`base_offset = c & 127;`
			`while (c & 128) {`
			`base_offset += 1;`
make overflow test on delta base offset work regardless of variable size This patch introduces the MSB() macro to obtain the desired number of most significant bits from a given variable independently of the variable type. It is then used to better implement the overflow test on the OBJ_OFS_DELTA base offset variable with the property of always working correctly regardless of the type/size of that variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:29 +02:00			`if (!base_offset \|\| MSB(base_offset, 7))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("offset value overflow for delta base object"));`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`p = fill(1);`
			`c = *p;`
			`use(1);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`base_offset = (base_offset << 7) + (c & 127);`
			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`delta_base->offset = obj->idx.offset - base_offset;`
better validation on delta base object offsets In one case, it was possible to have a bad offset equal to 0 effectively pointing a delta onto itself and crashing git after too many recursions. In the other cases, a negative offset could result due to off_t being signed. Catch those. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-30 00:02:45 +01:00			`if (delta_base->offset <= 0 \|\| delta_base->offset >= obj->idx.offset)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("delta base offset is out of bound"));`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`break;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`case OBJ_COMMIT:`
			`case OBJ_TREE:`
			`case OBJ_BLOB:`
			`case OBJ_TAG:`
			`break;`
			`default:`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("unknown object type %d"), obj->type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->hdr_size = consumed_bytes - obj->idx.offset;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, sha1);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj->idx.crc32 = input_crc32;`
compute object CRC32 with index-pack Same as previous patch but for index-pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:32 +02:00			`return data;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`}`

index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`static void unpack_data(struct object_entry obj,`
			`int (consume)(const unsigned char , unsigned long, void *),`
			`void *cb_data)`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`{`
fix index-pack with packs >4GB containing deltas on 32-bit machines This probably hasn't been properly tested before. Here's a script to create a 8GB repo with the necessary characteristics (copy the test-genrandom executable from the Git build tree to /tmp first): ----- #!/bin/bash git init git config core.compression 0 # create big objects with no deltas for i in $(seq -w 1 2 63) do echo $i /tmp/test-genrandom $i 268435456 > file_$i git add file_$i rm file_$i echo "file_$i -delta" >> .gitattributes done # create "deltifiable" objects in between big objects for i in $(seq -w 2 2 64) do echo "$i $i $i" >> grow cp grow file_$i git add file_$i rm file_$i done rm grow # create a pack with them git commit -q -m "commit of big objects interlaced with small deltas" git repack -a -d ----- Then clone this repo over the Git protocol. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-11 05:29:10 +01:00			`off_t from = obj[0].idx.offset + obj[0].hdr_size;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`unsigned long len = obj[1].idx.offset - from;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`unsigned char data, inbuf;`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`int status;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`data = xmalloc(consume ? 64*1024 : obj->size);`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`inbuf = xmalloc((len < 641024) ? len : 641024);`

add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`memset(&stream, 0, sizeof(stream));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`git_inflate_init(&stream);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`stream.next_out = data;`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`stream.avail_out = consume ? 64*1024 : obj->size;`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00
			`do {`
			`ssize_t n = (len < 641024) ? len : 641024;`
Merge branch 'nd/index-pack-one-fd-per-thread' Enable threaded index-pack on platforms without thread-unsafe pread() emulation. * nd/index-pack-one-fd-per-thread: index-pack: work around thread-unsafe pread() 2014-06-03 21:06:42 +02:00			`n = xpread(get_thread_data()->pack_fd, inbuf, n, from);`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`if (n < 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot pread pack file"));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`if (!n)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(Q_("premature end of pack file, %lu byte missing",`
			`"premature end of pack file, %lu bytes missing",`
			`len),`
			`len);`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`from += n;`
			`len -= n;`
			`stream.next_in = inbuf;`
			`stream.avail_in = n;`
index-pack: loop while inflating objects in unpack_data When the unpack_data function is given a consume() callback, it unpacks only 64K of the input at a time, feeding it to git_inflate along with a 64K output buffer. However, because we are inflating, there is a good chance that the output buffer will fill before consuming all of the input. In this case, we need to loop on git_inflate until we have fed the whole input buffer, feeding each chunk of output to the consume buffer. The current code does not do this, and as a result, will fail the loop condition and trigger a fatal "serious inflate inconsistency" error in this case. While we're rearranging the loop, let's get rid of the extra last_out pointer. It is meant to point to the beginning of the buffer that we feed to git_inflate, but in practice this is always the beginning of our same 64K buffer, because: 1. At the beginning of the loop, we are feeding the buffer. 2. At the end of the loop, if we are using a consume() function, we reset git_inflate's pointer to the beginning of the buffer. If we are not using a consume() function, then we do not care about the value of last_out at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-04 09:12:14 +02:00			`if (!consume)`
			`status = git_inflate(&stream, 0);`
			`else {`
			`do {`
			`status = git_inflate(&stream, 0);`
			`if (consume(data, stream.next_out - data, cb_data)) {`
			`free(inbuf);`
			`free(data);`
			`return NULL;`
			`}`
			`stream.next_out = data;`
			`stream.avail_out = 64*1024;`
			`} while (status == Z_OK && stream.avail_in);`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`}`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00			`} while (len && status == Z_OK && !stream.avail_in);`

			`/* This has been inflated OK when first encountered, so... */`
			`if (status != Z_STREAM_END \|\| stream.total_out != obj->size)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("serious inflate inconsistency"));`
index-pack: smarter memory usage when resolving deltas In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in get_data_from_pack() in order to inflate it. Let's read and inflate the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 18:11:07 +02:00
			`git_inflate_end(&stream);`
			`free(inbuf);`
index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`if (consume) {`
			`free(data);`
			`data = NULL;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`return data;`
			`}`

index-pack: factor out unpack core from get_data_from_pack This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:48 +02:00			`static void get_data_from_pack(struct object_entry obj)`
			`{`
			`return unpack_data(obj, NULL, NULL);`
			`}`

index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`static int compare_delta_bases(const union delta_base *base1,`
			`const union delta_base *base2,`
			`enum object_type type1,`
			`enum object_type type2)`
			`{`
			`int cmp = type1 - type2;`
			`if (cmp)`
			`return cmp;`
			`return memcmp(base1, base2, UNION_BASE_SZ);`
			`}`

			`static int find_delta(const union delta_base *base, enum object_type type)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
			`int first = 0, last = nr_deltas;`

			`while (first < last) {`
			`int next = (first + last) / 2;`
			`struct delta_entry *delta = &deltas[next];`
			`int cmp;`

index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`cmp = compare_delta_bases(base, &delta->base,`
			`type, objects[delta->obj_no].type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`if (!cmp)`
			`return next;`
			`if (cmp < 0) {`
			`last = next;`
			`continue;`
			`}`
			`first = next+1;`
			`}`
			`return -first-1;`
			`}`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`static void find_delta_children(const union delta_base *base,`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`int first_index, int last_index,`
			`enum object_type type)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`int first = find_delta(base, type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`int last = first;`
			`int end = nr_deltas - 1;`

index-pack: smarter memory usage during delta resolution There is no need to keep the base object data around after its last delta has been resolved. This also means that long delta chains with only one delta per base won't grow the cache size unnecessarily as the base will be freed before recursing down. To make it easy, find_delta_children() is modified so the first and last indices are initialized in all cases. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:58 +02:00			`if (first < 0) {`
			`*first_index = 0;`
			`*last_index = -1;`
			`return;`
			`}`
index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`while (first > 0 && !memcmp(&deltas[first - 1].base, base, UNION_BASE_SZ))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`--first;`
index-pack: compare only the first 20-bytes of the key. The "union delta_base" is a strange beast. It is a 20-byte binary blob key to search a binary searchable deltas[] array, each element of which uses it to represent its base object with either a full 20-byte SHA-1 or an offset in the pack. Which representation is used is determined by another field of the deltas[] array element, obj->type, so there is no room for confusion, as long as we make sure we compare the keys for the same type only with appropriate length. The code compared the full union with memcmp(). When storing the in-pack offset, the union was first cleared before storing an unsigned long, so comparison worked fine. On 64-bit architectures, however, the union typically is 24-byte long; the code did not clear the remaining 4-byte alignment padding when storing a full 20-byte SHA-1 representation. Using memcmp() to compare the whole union was wrong. This fixes the comparison to look at the first 20-bytes of the union, regardless of the architecture. As long as ulong is smaller than 20-bytes this works fine. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-17 22:23:26 +02:00			`while (last < end && !memcmp(&deltas[last + 1].base, base, UNION_BASE_SZ))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`++last;`
			`*first_index = first;`
			`*last_index = last;`
			`}`

index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`struct compare_data {`
			`struct object_entry *entry;`
			`struct git_istream *st;`
			`unsigned char *buf;`
			`unsigned long buf_size;`
			`};`

			`static int compare_objects(const unsigned char *buf, unsigned long size,`
			`void *cb_data)`
			`{`
			`struct compare_data *data = cb_data;`

			`if (data->buf_size < size) {`
			`free(data->buf);`
			`data->buf = xmalloc(size);`
			`data->buf_size = size;`
			`}`

			`while (size) {`
			`ssize_t len = read_istream(data->st, data->buf, size);`
			`if (len == 0)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`if (len < 0)`
			`die(_("unable to read %s"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`if (memcmp(buf, data->buf, len))`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(data->entry->idx.sha1));`
			`size -= len;`
			`buf += len;`
			`}`
			`return 0;`
			`}`

			`static int check_collison(struct object_entry *entry)`
			`{`
			`struct compare_data data;`
			`enum object_type type;`
			`unsigned long size;`

			`if (entry->size <= big_file_threshold \|\| entry->type != OBJ_BLOB)`
			`return -1;`

			`memset(&data, 0, sizeof(data));`
			`data.entry = entry;`
			`data.st = open_istream(entry->idx.sha1, &type, &size, NULL);`
			`if (!data.st)`
			`return -1;`
			`if (size != entry->size \|\| type != entry->type)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"),`
			`sha1_to_hex(entry->idx.sha1));`
			`unpack_data(entry, compare_objects, &data);`
			`close_istream(data.st);`
			`free(data.buf);`
			`return 0;`
			`}`

index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`static void sha1_object(const void data, struct object_entry obj_entry,`
			`unsigned long size, enum object_type type,`
			`const unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`void *new_data = NULL;`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`int collision_test_needed;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`assert(data \|\| obj_entry);`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_lock();`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`collision_test_needed = has_sha1_file(sha1);`
			`read_unlock();`

			`if (collision_test_needed && !data) {`
			`read_lock();`
			`if (!check_collison(obj_entry))`
			`collision_test_needed = 0;`
			`read_unlock();`
			`}`
			`if (collision_test_needed) {`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`void *has_data;`
			`enum object_type has_type;`
			`unsigned long has_size;`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`read_lock();`
			`has_type = sha1_object_info(sha1, &has_size);`
			`if (has_type != type \|\| has_size != size)`
			`die(_("SHA1 COLLISION FOUND WITH %s !"), sha1_to_hex(sha1));`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`has_data = read_sha1_file(sha1, &has_type, &has_size);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_unlock();`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`if (!data)`
			`data = new_data = get_data_from_pack(obj_entry);`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`if (!has_data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot read existing object %s"), sha1_to_hex(sha1));`
don't ever allow SHA1 collisions to exist by fetching a pack Waaaaaaay back Git was considered to be secure as it never overwrote an object it already had. This was ensured by always unpacking the packfile received over the network (both in fetch and receive-pack) and our already existing logic to not create a loose object for an object we already have. Lately however we keep "large-ish" packfiles on both fetch and push by running them through index-pack instead of unpack-objects. This would let an attacker perform a birthday attack. How? Assume the attacker knows a SHA-1 that has two different data streams. He knows the client is likely to have the "good" one. So he sends the "evil" variant to the other end as part of a "large-ish" packfile. The recipient keeps that packfile, and indexes it. Now since this is a birthday attack there is a SHA-1 collision; two objects exist in the repository with the same SHA-1. They have very different data streams. One of them is "evil". Currently the poor recipient cannot tell the two objects apart, short of by examining the timestamp of the packfiles. But lets say the recipient repacks before he realizes he's been attacked. We may wind up packing the "evil" version of the object, and deleting the "good" one. This is made even more likely by Junio's recent rearrange_packed_git patch (b867092f). It is extremely unlikely for a SHA1 collisions to occur, but if it ever happens with a remote (hence untrusted) object we simply must not let the fetch succeed. Normally received packs should not contain objects we already have. But when they do we must ensure duplicated objects with the same SHA1 actually contain the same data. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 20:32:35 +01:00			`if (size != has_size \|\| type != has_type \|\|`
			`memcmp(data, has_data, size) != 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("SHA1 COLLISION FOUND WITH %s !"), sha1_to_hex(sha1));`
Plug memory leak in index-pack collision checking codepath. 2007-04-03 18:33:46 +02:00			`free(has_data);`
index-pack: use streaming interface for collision test on large blobs When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-24 15:55:44 +02:00			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (strict) {`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_lock();`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (type == OBJ_BLOB) {`
			`struct blob *blob = lookup_blob(sha1);`
			`if (blob)`
			`blob->object.flags \|= FLAG_CHECKED;`
			`else`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("invalid blob object %s"), sha1_to_hex(sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`} else {`
			`struct object *obj;`
			`int eaten;`
			`void buf = (void ) data;`

index-pack: remove dead code (it should never happen) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:16 +02:00			`assert(data && "data can only be NULL for large _blobs_");`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`/*`
			`* we do not need to free the memory here, as the`
			`* buf is deleted by the caller.`
			`*/`
			`obj = parse_object_buffer(sha1, type, size, buf, &eaten);`
			`if (!obj)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("invalid %s"), typename(type));`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`if (do_fsck_object &&`
fsck_object(): allow passing object data separately from the object itself When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-09-10 15:52:51 +02:00			`fsck_object(obj, buf, size, 1,`
			`fsck_error_function))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Error in object"));`
Fix various sparse warnings in the git source code There are a few remaining ones, but this fixes the trivial ones. It boils down to two main issues that sparse complains about: - warning: Using plain integer as NULL pointer Sparse doesn't like you using '0' instead of 'NULL'. For various good reasons, not the least of which is just the visual confusion. A NULL pointer is not an integer, and that whole "0 works as NULL" is a historical accident and not very pretty. A few of these remain: zlib is a total mess, and Z_NULL is just a 0. I didn't touch those. - warning: symbol 'xyz' was not declared. Should it be static? Sparse wants to see declarations for any functions you export. A lack of a declaration tends to mean that you should either add one, or you should mark the function 'static' to show that it's in file scope. A few of these remain: I only did the ones that should obviously just be made static. That 'wt_status_submodule_summary' one is debatable. It has a few related flags (like 'wt_status_use_color') which _are_ declared, and are used by builtin-commit.c. So maybe we'd like to export it at some point, but it's not declared now, and not used outside of that file, so 'static' it is in this patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-18 19:28:43 +02:00			`if (fsck_walk(obj, mark_link, NULL))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00
			`if (obj->type == OBJ_TREE) {`
			`struct tree item = (struct tree ) obj;`
			`item->buffer = NULL;`
clear parsed flag when we free tree buffers Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-06 00:37:39 +02:00			`obj->parsed = 0;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`
			`if (obj->type == OBJ_COMMIT) {`
			`struct commit commit = (struct commit ) obj;`
commit: record buffer length in cache Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-10 23:44:13 +02:00			`if (detach_commit_buffer(commit, NULL) != data)`
provide a helper to free commit buffer This converts two lines into one at each caller. But more importantly, it abstracts the concept of freeing the buffer, which will make it easier to change later. Note that we also need to provide a "detach" mechanism for a tricky case in index-pack. We are passed a buffer for the object generated by processing the incoming pack. If we are not using --strict, we just calculate the sha1 on that buffer and return, leaving the caller to free it. But if we are using --strict, we actually attach that buffer to an object, pass the object to the fsck functions, and then detach the buffer from the object again (so that the caller can free it as usual). In this case, we don't want to free the buffer ourselves, but just make sure it is no longer associated with the commit. Note that we are making the assumption here that the attach/detach process does not impact the buffer at all (e.g., it is never reallocated or modified). That holds true now, and we have no plans to change that. However, as we abstract the commit_buffer code, this dependency becomes less obvious. So when we detach, let's also make sure that we get back the same buffer that we gave to the commit_buffer code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-13 00:05:37 +02:00			`die("BUG: parse_object_buffer transmogrified our buffer");`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`
			`obj->flags \|= FLAG_CHECKED;`
			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`read_unlock();`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`}`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`free(new_data);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`/*`
			`* This function is part of find_unresolved_deltas(). There are two`
			`* walkers going in the opposite ways.`
			`*`
			`* The first one in find_unresolved_deltas() traverses down from`
			`* parent node to children, deflating nodes along the way. However,`
			`* memory for deflated nodes is limited by delta_base_cache_limit, so`
			`* at some point parent node's deflated content may be freed.`
			`*`
			`* The second walker is this function, which goes from current node up`
			`* to top parent if necessary to deflate the node. In normal`
			`* situation, its parent node would be already deflated, so it just`
			`* needs to apply delta.`
			`*`
			`* In the worst case scenario, parent node is no longer deflated because`
			`* we're running out of delta_base_cache_limit; we need to re-deflate`
			`* parents, possibly up to the top base.`
			`*`
			`* All deflated objects here are subject to be freed if we exceed`
			`* delta_base_cache_limit, just like in find_unresolved_deltas(), we`
			`* just need to make sure the last node is not freed.`
			`*/`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`static void get_base_data(struct base_data c)`
			`{`
			`if (!c->data) {`
			`struct object_entry *obj = c->obj;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`struct base_data **delta = NULL;`
			`int delta_nr = 0, delta_alloc = 0;`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`while (is_delta_type(c->obj->type) && !c->data) {`
			`ALLOC_GROW(delta, delta_nr + 1, delta_alloc);`
			`delta[delta_nr++] = c;`
			`c = c->base;`
			`}`
			`if (!delta_nr) {`
			`c->data = get_data_from_pack(obj);`
			`c->size = obj->size;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`prune_base_data(c);`
			`}`
			`for (; delta_nr > 0; delta_nr--) {`
			`void base, raw;`
			`c = delta[delta_nr - 1];`
			`obj = c->obj;`
			`base = get_base_data(c->base);`
			`raw = get_data_from_pack(obj);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`c->data = patch_delta(`
			`base, c->base->size,`
			`raw, obj->size,`
			`&c->size);`
			`free(raw);`
			`if (!c->data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(obj->idx.offset, _("failed to apply delta"));`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`get_thread_data()->base_cache_used += c->size;`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`prune_base_data(c);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`}`
index-pack: eliminate unlimited recursion in get_base_data() Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:55 +01:00			`free(delta);`
index-pack: Honor core.deltaBaseCacheLimit when resolving deltas If we are trying to resolve deltas for a long delta chain composed of multi-megabyte objects we can easily run into requiring 500M+ of memory to hold each object in the chain on the call stack while we recurse into the dependent objects and resolve them. We now use a simple delta cache that discards objects near the bottom of the call stack first, as they are the most least recently used objects in this current delta chain. If we recurse out of a chain we may find the base object is no longer available, as it was free'd to keep memory under the deltaBaseCacheLimit. In such cases we must unpack the base object again, which will require recursing back to the root of the top of the delta chain as we released that root first. The astute reader will probably realize that we can still exceed the delta base cache limit, but this happens only if the most recent base plus the delta plus the inflated dependent sum up to more than the base cache limit. Due to the way patch_delta is currently implemented we cannot operate in less memory anyway. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-15 06:45:34 +02:00			`}`
			`return c->data;`
			`}`

index-pack: Refactor base arguments of resolve_delta into a struct We need to discard base objects which are not recently used if our memory gets low, such as when we are unpacking a long delta chain of a very large object. To support tracking the available base objects we combine the pointer and size into a struct. Future changes would allow the data pointer to be free'd and marked NULL if memory gets low. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:44 +02:00			`static void resolve_delta(struct object_entry *delta_obj,`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`struct base_data base, struct base_data result)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`void base_data, delta_data;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`delta_obj->real_type = base->obj->real_type;`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`if (show_stat) {`
			`delta_obj->delta_depth = base->obj->delta_depth + 1;`
			`deepest_delta_lock();`
			`if (deepest_delta < delta_obj->delta_depth)`
			`deepest_delta = delta_obj->delta_depth;`
			`deepest_delta_unlock();`
			`}`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`delta_obj->base_object_no = base->obj - objects;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`delta_data = get_data_from_pack(delta_obj);`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`base_data = get_base_data(base);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`result->obj = delta_obj;`
fix multiple issues in index-pack Since commit 9441b61dc5, two issues affected correct behavior of index-pack: 1) The real_type of a delta object is the 'real_type' of its base, not the 'type' which can be a "delta type". Consequence of this is a corrupted pack index file which only needs to be recreated with a good index-pack command ('git verify-pack' will flag those). 2) The code sequence: result->data = patch_delta(get_base_data(base), base->obj->size, delta_data, delta_size, &result->size); has two issues of its own since base->obj->size should instead be base->size as we want the size of the actual object data and not the size of the delta object it is represented by. Except that simply replacing base->obj->size with base->size won't make the code more correct as the C language doesn't enforce a particular ordering for the evaluation of needed arguments for a function call, hence base->size could be pushed on the stack before get_base_data() which initializes base->size is called. Signed-off-by: Nicolas Pitre <nico@cam.org> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-20 22:46:19 +02:00			`result->data = patch_delta(base_data, base->size,`
			`delta_data, delta_obj->size, &result->size);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(delta_data);`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`if (!result->data)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`bad_object(delta_obj->idx.offset, _("failed to apply delta"));`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`hash_sha1_file(result->data, result->size,`
			`typename(delta_obj->real_type), delta_obj->idx.sha1);`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`sha1_object(result->data, NULL, result->size, delta_obj->real_type,`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`delta_obj->idx.sha1);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`counter_lock();`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`nr_resolved_deltas++;`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`counter_unlock();`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`}`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`static struct base_data find_unresolved_deltas_1(struct base_data base,`
			`struct base_data *prev_base)`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`{`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_last == -1 && base->ofs_last == -1) {`
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`union delta_base base_spec;`

			`hashcpy(base_spec.sha1, base->obj->idx.sha1);`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`find_delta_children(&base_spec,`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`&base->ref_first, &base->ref_last, OBJ_REF_DELTA);`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`memset(&base_spec, 0, sizeof(base_spec));`
			`base_spec.offset = base->obj->idx.offset;`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`find_delta_children(&base_spec,`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`&base->ofs_first, &base->ofs_last, OBJ_OFS_DELTA);`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_last == -1 && base->ofs_last == -1) {`
			`free(base->data);`
			`return NULL;`
			`}`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`link_base_data(prev_base, base);`
			`}`
index-pack: Chain the struct base_data on the stack for traversal We need to release earlier inflated base objects when memory gets low, which means we need to be able to walk up or down the stack to locate the objects we want to release, and free their data. The new link/unlink routines allow inserting and removing the struct base_data during recursion inside resolve_delta, and the global base_cache gives us the head of the chain (bottom of the stack) so we can traverse it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:45 +02:00
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ref_first <= base->ref_last) {`
			`struct object_entry *child = objects + deltas[base->ref_first].obj_no;`
			`struct base_data *result = alloc_base_data();`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`assert(child->real_type == OBJ_REF_DELTA);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`resolve_delta(child, base, result);`
			`if (base->ref_first == base->ref_last && base->ofs_last == -1)`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`free_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00
			`base->ref_first++;`
			`return result;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`}`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (base->ofs_first <= base->ofs_last) {`
			`struct object_entry *child = objects + deltas[base->ofs_first].obj_no;`
			`struct base_data *result = alloc_base_data();`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`assert(child->real_type == OBJ_OFS_DELTA);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`resolve_delta(child, base, result);`
			`if (base->ofs_first == base->ofs_last)`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00			`free_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00
			`base->ofs_first++;`
			`return result;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00
index-pack: rationalize delta resolution code Instead of having strange loops for walking unresolved deltas with the same base duplicated in many places, let's rework the code so this is done in a single place instead. This simplifies callers quite a bit too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-17 21:57:57 +02:00			`unlink_base_data(base);`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`return NULL;`
			`}`

			`static void find_unresolved_deltas(struct base_data *base)`
			`{`
			`struct base_data new_base, prev_base = NULL;`
			`for (;;) {`
			`new_base = find_unresolved_deltas_1(base, prev_base);`

			`if (new_base) {`
			`prev_base = base;`
			`base = new_base;`
			`} else {`
			`free(base);`
			`base = prev_base;`
			`if (!base)`
			`return;`
			`prev_base = base->base;`
			`}`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

			`static int compare_delta_entry(const void a, const void b)`
			`{`
			`const struct delta_entry *delta_a = a;`
			`const struct delta_entry *delta_b = b;`
index-pack: group the delta-base array entries also by type Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-02 19:06:51 +01:00
			`/* group by type (ref vs ofs) and then by value (sha-1 or offset) */`
			`return compare_delta_bases(&delta_a->base, &delta_b->base,`
			`objects[delta_a->obj_no].type,`
			`objects[delta_b->obj_no].type);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`static void resolve_base(struct object_entry *obj)`
			`{`
			`struct base_data *base_obj = alloc_base_data();`
			`base_obj->obj = obj;`
			`base_obj->data = NULL;`
			`find_unresolved_deltas(base_obj);`
			`}`

index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`
			`static void threaded_second_pass(void data)`
			`{`
			`set_thread_data(data);`
			`for (;;) {`
			`int i;`
index-pack: guard nr_resolved_deltas reads by lock The threaded parts of index-pack increment the number of resolved deltas in nr_resolved_deltas guarded by counter_mutex. However, the per-thread outer loop accessed nr_resolved_deltas without any locks. This is not wrong as such, since it doesn't matter all that much whether we get an outdated value. However, unless someone proves that this one lock makes all the performance difference, it would be much cleaner to guard _all_ accesses to the variable with the lock. The only such use is display_progress() in the threaded section (all others are in the conclude_pack() callchain outside the threaded part). To make it obvious that it cannot deadlock, move it out of work_mutex. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 15:16:41 +01:00			`counter_lock();`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`display_progress(progress, nr_resolved_deltas);`
index-pack: guard nr_resolved_deltas reads by lock The threaded parts of index-pack increment the number of resolved deltas in nr_resolved_deltas guarded by counter_mutex. However, the per-thread outer loop accessed nr_resolved_deltas without any locks. This is not wrong as such, since it doesn't matter all that much whether we get an outdated value. However, unless someone proves that this one lock makes all the performance difference, it would be much cleaner to guard _all_ accesses to the variable with the lock. The only such use is display_progress() in the threaded section (all others are in the conclude_pack() callchain outside the threaded part). To make it obvious that it cannot deadlock, move it out of work_mutex. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 15:16:41 +01:00			`counter_unlock();`
			`work_lock();`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`while (nr_dispatched < nr_objects &&`
			`is_delta_type(objects[nr_dispatched].type))`
			`nr_dispatched++;`
			`if (nr_dispatched >= nr_objects) {`
			`work_unlock();`
			`break;`
			`}`
			`i = nr_dispatched++;`
			`work_unlock();`

			`resolve_base(&objects[i]);`
			`}`
			`return NULL;`
			`}`
			`#endif`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`/*`
			`* First pass:`
			`* - find locations of all objects;`
			`* - calculate SHA1 of all non-delta objects;`
			`* - remember base (SHA1 or offset) for all deltas.`
			`*/`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`static void parse_pack_objects(unsigned char *sha1)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`int i, nr_delays = 0;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`struct delta_entry *delta = deltas;`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`struct stat st;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
make progress "title" part of the common progress interface If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 20:10:07 +02:00			`if (verbose)`
add throughput display to index-pack ... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:35 +01:00			`progress = start_progress(`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`from_stdin ? _("Receiving objects") : _("Indexing objects"),`
add throughput display to index-pack ... and call it "Receiving objects" when over stdin to look clearer to end users. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:35 +01:00			`nr_objects);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`
index-pack: hash non-delta objects while reading from stream Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:46 +02:00			`void *data = unpack_raw_entry(obj, &delta->base, obj->idx.sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`obj->real_type = obj->type;`
index-pack: a miniscule refactor Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:14 +02:00			`if (is_delta_type(obj->type)) {`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`nr_deltas++;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`delta->obj_no = i;`
teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`delta++;`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`} else if (!data) {`
			`/* large blobs, check later */`
			`obj->real_type = OBJ_BAD;`
			`nr_delays++;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`} else`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00			`sha1_object(data, NULL, obj->size, obj->type, obj->idx.sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(data);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, i+1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`objects[i].idx.offset = consumed_bytes;`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`stop_progress(&progress);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`/* Check pack integrity */`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`flush();`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Final(sha1, &input_ctx);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00			`if (hashcmp(fill(20), sha1))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack is corrupted (SHA1 mismatch)"));`
mimic unpack-objects when --stdin is used with index-pack It appears that git-unpack-objects writes the last part of the input buffer to stdout after the pack has been parsed. This looks a bit suspicious since the last fill() might have filled the buffer up to the 4096 byte limit and more data might still be pending on stdin, but since this is about being a drop-in replacement for unpack-objects let's simply duplicate the same behavior for now. [jc: with fix-up appeared in Nico's sleep] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:31:53 +02:00			`use(20);`
add the capability for index-pack to read from a stream This patch only adds the streaming capability to index-pack. Although the code is different it has the exact same functionality as before to make sure nothing broke. This is in preparation for receiving packs over the net, parse them on the fly, fix them up if they are "thin" packs, and keep the resulting pack instead of exploding it into loose objects. But such functionality should come separately. One immediate advantage of this patch is that index-pack can now deal with packs up to 4GB in size even on 32-bit architectures since the pack is not entirely mmap()'d all at once anymore. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 20:45:21 +02:00
			`/* If input_fd is a file, we should have reached its end now. */`
			`if (fstat(input_fd, &st))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot fstat packfile"));`
git-bundle: assorted fixes This patch fixes issues mentioned by Junio, Nico and Simon: - I forgot to convert the usage string when removing the "--" from the subcommands, - a style fix in the bundle_header, - use xread() instead of read(), - use write_or_die() instead of write(), - make the bundle header extensible, - fail if the whitespace after a sha1 of a reference is missing, - close() the fds passed to a subprocess, - in verify_bundle(), do not use "rev-list --stdin", but rather pass the revs directly (avoiding a fork()), - fix a corrupted comment in show_object(), and - fix the size check in index_pack. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-22 19:14:14 +01:00			`if (S_ISREG(st.st_mode) &&`
			`lseek(input_fd, 0, SEEK_CUR) - input_len != st.st_size)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("pack has junk at the end"));`
index-pack: use streaming interface on large blobs (most of the time) unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-23 16:09:47 +02:00
			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`
			`if (obj->real_type != OBJ_BAD)`
			`continue;`
			`obj->real_type = obj->type;`
			`sha1_object(NULL, obj, obj->size, obj->type, obj->idx.sha1);`
			`nr_delays--;`
			`}`
			`if (nr_delays)`
			`die(_("confusion beyond insanity in parse_pack_objects()"));`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`}`

			`/*`
			`* Second pass:`
			`* - for all non-delta objects, look if it is used as a base for`
			`* deltas;`
			`* - if used as a base, uncompress the object and apply all deltas,`
			`* recursively checking if the resulting object is used as a base`
			`* for some more deltas.`
			`*/`
			`static void resolve_deltas(void)`
			`{`
			`int i;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`if (!nr_deltas)`
			`return;`

teach git-index-pack about deltas with offset to base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-21 06:08:33 +02:00			`/* Sort deltas by base SHA1/offset for fast searching */`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`qsort(deltas, nr_deltas, sizeof(struct delta_entry),`
			`compare_delta_entry);`

make progress "title" part of the common progress interface If the progress bar ends up in a box, better provide a title for it too. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 20:10:07 +02:00			`if (verbose)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`progress = start_progress(_("Resolving deltas"), nr_deltas);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00
			`#ifndef NO_PTHREADS`
			`nr_dispatched = 0;`
			`if (nr_threads > 1 \|\| getenv("GIT_FORCE_THREADS")) {`
			`init_thread();`
			`for (i = 0; i < nr_threads; i++) {`
			`int ret = pthread_create(&thread_data[i].thread, NULL,`
			`threaded_second_pass, thread_data + i);`
			`if (ret)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`die(_("unable to create thread: %s"),`
			`strerror(ret));`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`}`
			`for (i = 0; i < nr_threads; i++)`
			`pthread_join(thread_data[i].thread, NULL);`
			`cleanup_thread();`
			`return;`
			`}`
			`#endif`

Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`

index-pack: a miniscule refactor Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:14 +02:00			`if (is_delta_type(obj->type))`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`continue;`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`resolve_base(obj);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, nr_resolved_deltas);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`

index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`/*`
			`* Third pass:`
			`* - append objects to convert thin pack to full pack if required`
			`* - write the final 20-byte SHA-1`
			`*/`
			`static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved);`
			`static void conclude_pack(int fix_thin_pack, const char curr_pack, unsigned char pack_sha1)`
			`{`
			`if (nr_deltas == nr_resolved_deltas) {`
			`stop_progress(&progress);`
			`/* Flush remaining pack final 20-byte SHA1. */`
			`flush();`
			`return;`
			`}`

			`if (fix_thin_pack) {`
			`struct sha1file *f;`
			`unsigned char read_sha1[20], tail_sha1[20];`
index-pack: fix buffer overflow caused by translations The translation of "completed with %d local objects" is put in a 48-byte buffer, which may be enough for English but not true for any translations. Convert it to use strbuf (i.e. no hard limit on translation length). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-16 02:25:18 +01:00			`struct strbuf msg = STRBUF_INIT;`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`int nr_unresolved = nr_deltas - nr_resolved_deltas;`
			`int nr_objects_initial = nr_objects;`
			`if (nr_unresolved <= 0)`
Merge branch 'nd/threaded-index-pack' Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation 2012-05-14 20:50:40 +02:00			`die(_("confusion beyond insanity"));`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`objects = xrealloc(objects,`
			`(nr_objects + nr_unresolved + 1)`
			`* sizeof(*objects));`
index-pack: always zero-initialize object_entry list Commit 38a4556 (index-pack: start learning to emulate "verify-pack -v", 2011-06-03) added a "delta_depth" counter to each "struct object_entry". Initially, all object entries have their depth set to 0; in resolve_delta, we then set the depth of each delta to "base + 1". Base entries never have their depth touched, and remain at 0. To ensure that all depths start at 0, that commit changed calls to xmalloc the object_entry list into calls to xcalloc. However, it forgot that we grow the list with xrealloc later. These extra entries are used when we add an object from elsewhere to complete a thin pack. If we add a non-delta object, its depth value will just be uninitialized heap data. This patch fixes it by zero-initializing entries we add to the objects list via the xrealloc. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 17:17:22 +01:00			`memset(objects + nr_objects + 1, 0,`
			`nr_unresolved * sizeof(*objects));`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`f = sha1fd(output_fd, curr_pack);`
			`fix_unresolved_deltas(f, nr_unresolved);`
index-pack: fix buffer overflow caused by translations The translation of "completed with %d local objects" is put in a 48-byte buffer, which may be enough for English but not true for any translations. Convert it to use strbuf (i.e. no hard limit on translation length). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-16 02:25:18 +01:00			`strbuf_addf(&msg, _("completed with %d local objects"),`
			`nr_objects - nr_objects_initial);`
			`stop_progress_msg(&progress, msg.buf);`
			`strbuf_release(&msg);`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`sha1close(f, tail_sha1, 0);`
			`hashcpy(read_sha1, pack_sha1);`
			`fixup_pack_header_footer(output_fd, pack_sha1,`
			`curr_pack, nr_objects,`
			`read_sha1, consumed_bytes-20);`
			`if (hashcmp(read_sha1, tail_sha1) != 0)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`die(_("Unexpected tail checksum for %s "`
			`"(disk corruption?)"), curr_pack);`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`}`
			`if (nr_deltas != nr_resolved_deltas)`
Merge branch 'nd/threaded-index-pack' Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation 2012-05-14 20:50:40 +02:00			`die(Q_("pack has %d unresolved delta",`
			`"pack has %d unresolved deltas",`
			`nr_deltas - nr_resolved_deltas),`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`nr_deltas - nr_resolved_deltas);`
			`}`

index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static int write_compressed(struct sha1file f, void in, unsigned int size)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 20:52:15 +02:00			`git_zstream stream;`
index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`int status;`
			`unsigned char outbuf[4096];`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`memset(&stream, 0, sizeof(stream));`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`git_deflate_init(&stream, zlib_compression_level);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`stream.next_in = in;`
			`stream.avail_in = size;`

index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`do {`
			`stream.next_out = outbuf;`
			`stream.avail_out = sizeof(outbuf);`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`status = git_deflate(&stream, Z_FINISH);`
index-pack: smarter memory usage when appending objects In the same spirit as commit 9892bebafe, let's avoid allocating the full buffer for the deflated data in write_compressed() in order to write it. Let's deflate and write the data in chunks instead to reduce memory usage. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-12 22:50:35 +02:00			`sha1write(f, outbuf, sizeof(outbuf) - stream.avail_out);`
			`} while (status == Z_OK);`

			`if (status != Z_STREAM_END)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("unable to deflate appended object (%d)"), status);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`size = stream.total_out;`
zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-10 19:55:10 +02:00			`git_deflate_end(&stream);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`return size;`
			`}`

index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static struct object_entry append_obj_to_pack(struct sha1file f,`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`const unsigned char sha1, void buf,`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`unsigned long size, enum object_type type)`
			`{`
			`struct object_entry *obj = &objects[nr_objects++];`
			`unsigned char header[10];`
			`unsigned long s = size;`
			`int n = 0;`
			`unsigned char c = (type << 4) \| (s & 15);`
			`s >>= 4;`
			`while (s) {`
			`header[n++] = c \| 0x80;`
			`c = s & 0x7f;`
			`s >>= 7;`
			`}`
			`header[n++] = c;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`crc32_begin(f);`
			`sha1write(f, header, n);`
index-pack.c: correctly initialize appended objects When index-pack completes a thin pack it appends objects to the pack. Since the commit 92392b4(index-pack: Honor core.deltaBaseCacheLimit when resolving deltas) such an object can be pruned in case of memory pressure, and will be read back again by get_data_from_pack(). For this to work, the fields in object_entry structure need to be initialized properly. Noticed by Pierre Habouzit. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Acked-by: Nicolas Pitre <nico@cam.org> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-24 19:32:00 +02:00			`obj[0].size = size;`
			`obj[0].hdr_size = n;`
			`obj[0].type = type;`
			`obj[0].real_type = type;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`obj[1].idx.offset = obj[0].idx.offset + n;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`obj[1].idx.offset += write_compressed(f, buf, size);`
			`obj[0].idx.crc32 = crc32_end(f);`
fix pread()'s short read in index-pack Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for its ability to compute a SHA1 on the written data. This also provides data buffering which, along with commit 92392b4a45, will confuse pread() whenever an appended object is 1) freed due to memory pressure because of the depth-first delta processing, and 2) needed again because it has many delta children, and 3) its data is still buffered by sha1write(). Let's fix the issue by simply forcing cached data out when such an object is written so it can be pread()'d at leisure. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-10 04:08:51 +02:00			`sha1flush(f);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`hashcpy(obj->idx.sha1, sha1);`
index-pack: Track the object_entry that creates each base_data If we free the data stored within a base_data we need the struct object_entry to get the data back again for use with another dependent delta. Storing the object_entry* in base_data makes it simple to call get_data_from_pack() to recover the compressed information. This however means that we must add the missing base object to the end of our packfile prior to calling resolve_delta() on each of the dependent deltas. Adding the base first ensures we can read the base back from the pack we are indexing, as if it had been included by the remote side. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-14 04:07:46 +02:00			`return obj;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`

			`static int delta_pos_compare(const void _a, const void _b)`
			`{`
			`struct delta_entry a = (struct delta_entry **)_a;`
			`struct delta_entry b = (struct delta_entry **)_b;`
			`return a->obj_no - b->obj_no;`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`{`
			`struct delta_entry **sorted_by_pos;`
common progress display support Instead of having this code duplicated in multiple places, let's have a common interface for progress display. If someday someone wishes to display a cheezy progress bar instead then only one file will have to be changed. Note: I left merge-recursive.c out since it has a strange notion of progress as it apparently increase the expected total number as it goes. Someone with more intimate knowledge of what that is supposed to mean might look at converting it to the common progress interface. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-18 20:27:45 +02:00			`int i, n = 0;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`/*`
			`* Since many unresolved deltas may well be themselves base objects`
			`* for more unresolved deltas, we really want to include the`
			`* smallest number of base objects that would cover as much delta`
			`* as possible by picking the`
			`* trunc deltas first, allowing for other deltas to resolve without`
			`* additional base objects. Since most base objects are to be found`
			`* before deltas depending on them, a good heuristic is to start`
			`* resolving deltas in the same order as their position in the pack.`
			`*/`
			`sorted_by_pos = xmalloc(nr_unresolved * sizeof(*sorted_by_pos));`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 0; i < nr_deltas; i++) {`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`if (objects[deltas[i].obj_no].real_type != OBJ_REF_DELTA)`
			`continue;`
			`sorted_by_pos[n++] = &deltas[i];`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`qsort(sorted_by_pos, n, sizeof(*sorted_by_pos), delta_pos_compare);`

			`for (i = 0; i < n; i++) {`
			`struct delta_entry *d = sorted_by_pos[i];`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`struct base_data *base_obj = alloc_base_data();`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00
			`if (objects[d->obj_no].real_type != OBJ_REF_DELTA)`
			`continue;`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`base_obj->data = read_sha1_file(d->base.sha1, &type, &base_obj->size);`
			`if (!base_obj->data)`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`continue;`

index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`if (check_sha1_signature(d->base.sha1, base_obj->data,`
			`base_obj->size, typename(type)))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("local object %s is corrupt"), sha1_to_hex(d->base.sha1));`
index-pack: eliminate recursion in find_unresolved_deltas Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 13:19:54 +01:00			`base_obj->obj = append_obj_to_pack(f, d->base.sha1,`
			`base_obj->data, base_obj->size, type);`
			`find_unresolved_deltas(base_obj);`
relax usage of the progress API Since it is now OK to pass a null pointer to display_progress() and stop_progress() resulting in a no-op, then we can simplify the code and remove a bunch of lines by not making those calls conditional all the time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 19:57:33 +01:00			`display_progress(progress, nr_resolved_deltas);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`}`
			`free(sorted_by_pos);`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`static void final(const char final_pack_name, const char curr_pack_name,`
			`const char final_index_name, const char curr_index_name,`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`const char keep_name, const char keep_msg,`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`unsigned char *sha1)`
			`{`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *report = "pack";`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`char name[PATH_MAX];`
			`int err;`

			`if (!from_stdin) {`
			`close(input_fd);`
			`} else {`
Make pack creation always fsync() the result This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 17:42:16 +02:00			`fsync_or_die(output_fd, curr_pack_name);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`err = close(output_fd);`
			`if (err)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("error while closing pack file"));`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`}`

Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`if (keep_msg) {`
			`int keep_fd, keep_msg_len = strlen(keep_msg);`
Make sure objects/pack exists before creating a new pack In a repository created with git older than f49fb35 (git-init-db: create "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is not created upon initialization. It was Ok because subdirectories are created as needed inside directories init-db creates, and back then, packfiles were recent invention. After the said commit, new codepaths started relying on the presense of objects/pack/ directory in the repository. This was exacerbated with 8b4eb6b (Do not perform cross-directory renames when creating packs, 2008-09-22) that moved the location temporary pack files are created from objects/ directory to objects/pack/ directory, because moving temporary to the final location was done carefully with lazy leading directory creation. Many packfile related operations in such an old repository can fail mysteriously because of this. This commit introduces two helper functions to make things work better. - odb_mkstemp() is a specialized version of mkstemp() to refactor the code and teach it to create leading directories as needed; - odb_pack_keep() refactors the code to create a ".keep" file while create leading directories as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-25 08:11:29 +01:00
			`if (!keep_name)`
			`keep_fd = odb_pack_keep(name, sizeof(name), sha1);`
			`else`
			`keep_fd = open(keep_name, O_RDWR\|O_CREAT\|O_EXCL, 0600);`

have index-pack create .keep file more carefully If by chance we receive a pack which content (list of objects) matches another pack that we already have, and if that pack is marked with a .keep file, then we should not overwrite it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:24 +01:00			`if (keep_fd < 0) {`
			`if (errno != EEXIST)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot write keep file '%s'"),`
index-pack: report error using the correct variable We feed a string pointer that is potentially NULL to die() when showing the message. Don't. Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-17 23:08:36 +01:00			`keep_name ? keep_name : name);`
have index-pack create .keep file more carefully If by chance we receive a pack which content (list of objects) matches another pack that we already have, and if that pack is marked with a .keep file, then we should not overwrite it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:24 +01:00			`} else {`
			`if (keep_msg_len > 0) {`
			`write_or_die(keep_fd, keep_msg, keep_msg_len);`
			`write_or_die(keep_fd, "\n", 1);`
			`}`
detect close failure on just-written file handles I audited git for potential undetected write failures. In the cases fixed below, the diagnostics I add mimic the diagnostics used in surrounding code, even when that means not reporting the precise strerror(errno) cause of the error. Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-24 21:20:41 +02:00			`if (close(keep_fd) != 0)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die_errno(_("cannot close written keep file '%s'"),`
index-pack: report error using the correct variable We feed a string pointer that is potentially NULL to die() when showing the message. Don't. Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-17 23:08:36 +01:00			`keep_name ? keep_name : name);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00			`report = "keep";`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`}`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (final_pack_name != curr_pack_name) {`
			`if (!final_pack_name) {`
			`snprintf(name, sizeof(name), "%s/pack/pack-%s.pack",`
			`get_object_directory(), sha1_to_hex(sha1));`
			`final_pack_name = name;`
			`}`
			`if (move_temp_to_file(curr_pack_name, final_pack_name))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot store pack file"));`
Move chmod(foo, 0444) into move_temp_to_file() When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-26 16:16:47 +01:00			`} else if (from_stdin)`
Do not rename read-only files during a push Win32 does not allow renaming read-only files (at least on a Samba share), making push into a local directory to fail. Thus, defer the chmod() call in index-pack.c:final() only after move_temp_to_file() was called. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-03 12:20:43 +02:00			`chmod(final_pack_name, 0444);`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00
			`if (final_index_name != curr_index_name) {`
			`if (!final_index_name) {`
			`snprintf(name, sizeof(name), "%s/pack/pack-%s.idx",`
			`get_object_directory(), sha1_to_hex(sha1));`
			`final_index_name = name;`
			`}`
			`if (move_temp_to_file(curr_index_name, final_index_name))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("cannot store index file"));`
Move chmod(foo, 0444) into move_temp_to_file() When writing out a loose object or a pack (index), move_temp_to_file() is called to finalize the resulting file. These files (loose files and packs) should all have permission mode 0444 (modulo adjust_shared_perm()). Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite (or even forgetting to chmod() at all), do the chmod() call from within move_temp_to_file(). Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-26 16:16:47 +01:00			`} else`
			`chmod(final_index_name, 0444);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00
			`if (!from_stdin) {`
			`printf("%s\n", sha1_to_hex(sha1));`
			`} else {`
			`char buf[48];`
			`int len = snprintf(buf, sizeof(buf), "%s\t%s\n",`
			`report, sha1_to_hex(sha1));`
index-pack: write-or-die instead of unchecked write-in-full. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-11 22:15:51 +01:00			`write_or_die(1, buf, len);`
remove .keep pack lock files when done with refs update This makes both git-fetch and git-push (fetch-pack and receive-pack) safe against a possible race with aparallel git-repack -a -d that could prune the new pack while it is not yet referenced, and remove the .keep file after refs have been updated. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:25 +01:00
			`/*`
			`* Let's just mimic git-unpack-objects here and write`
			`* the last part of the input buffer to stdout.`
			`*/`
			`while (input_len) {`
			`err = xwrite(1, input_buffer + input_offset, input_len);`
			`if (err <= 0)`
			`break;`
			`input_len -= err;`
			`input_offset += err;`
			`}`
			`}`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`

Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`static int git_index_pack_config(const char k, const char v, void *cb)`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`{`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`struct pack_idx_option *opts = cb;`

make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`if (!strcmp(k, "pack.indexversion")) {`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts->version = git_config_int(k, v);`
			`if (opts->version > 2)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`die(_("bad pack.indexversion=%"PRIu32), opts->version);`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`return 0;`
			`}`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`if (!strcmp(k, "pack.threads")) {`
			`nr_threads = git_config_int(k, v);`
			`if (nr_threads < 0)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`die(_("invalid number of threads specified (%d)"),`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`nr_threads);`
			`#ifdef NO_PTHREADS`
			`if (nr_threads != 1)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`warning(_("no threads support, ignoring %s"), k);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`nr_threads = 1;`
			`#endif`
			`return 0;`
			`}`
Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`return git_default_config(k, v, cb);`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00			`}`

index-pack --verify: read anomalous offsets from v2 idx file A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 01:55:26 +01:00			`static int cmp_uint32(const void a_, const void b_)`
			`{`
			`uint32_t a = ((uint32_t )a_);`
			`uint32_t b = ((uint32_t )b_);`

			`return (a < b) ? -1 : (a != b);`
			`}`

			`static void read_v2_anomalous_offsets(struct packed_git *p,`
			`struct pack_idx_option *opts)`
			`{`
			`const uint32_t idx1, idx2;`
			`uint32_t i;`

			`/* The address of the 4-byte offset table */`
			`idx1 = (((const uint32_t *)p->index_data)`
			`+ 2 /* 8-byte header */`
			`+ 256 /* fan out */`
			`+ 5 * p->num_objects /* 20-byte SHA-1 table */`
			`+ p->num_objects /* CRC32 table */`
			`);`

			`/* The address of the 8-byte offset table */`
			`idx2 = idx1 + p->num_objects;`

			`for (i = 0; i < p->num_objects; i++) {`
			`uint32_t off = ntohl(idx1[i]);`
			`if (!(off & 0x80000000))`
			`continue;`
			`off = off & 0x7fffffff;`
			`if (idx2[off * 2])`
			`continue;`
			`/*`
			`* The real offset is ntohl(idx2[off * 2]) in high 4`
			`* octets, and ntohl(idx2[off * 2 + 1]) in low 4`
			`* octets. But idx2[off * 2] is Zero!!!`
			`*/`
			`ALLOC_GROW(opts->anomaly, opts->anomaly_nr + 1, opts->anomaly_alloc);`
			`opts->anomaly[opts->anomaly_nr++] = ntohl(idx2[off * 2 + 1]);`
			`}`

			`if (1 < opts->anomaly_nr)`
			`qsort(opts->anomaly, opts->anomaly_nr, sizeof(uint32_t), cmp_uint32);`
			`}`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`static void read_idx_option(struct pack_idx_option opts, const char pack_name)`
			`{`
			`struct packed_git *p = add_packed_git(pack_name, strlen(pack_name), 1);`

			`if (!p)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot open existing pack file '%s'"), pack_name);`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (open_pack_index(p))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot open existing pack idx file for '%s'"), pack_name);`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00
			`/* Read the attributes from the existing idx file */`
			`opts->version = p->index_version;`

index-pack --verify: read anomalous offsets from v2 idx file A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 01:55:26 +01:00			`if (opts->version == 2)`
			`read_v2_anomalous_offsets(p, opts);`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`/*`
			`* Get rid of the idx file as we do not need it anymore.`
			`* NEEDSWORK: extract this bit from free_pack_by_name() in`
			`* sha1_file.c, perhaps? It shouldn't matter very much as we`
			`* know we haven't installed this pack (hence we never have`
			`* read anything from it).`
			`*/`
			`close_pack_index(p);`
			`free(p);`
			`}`

index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`static void show_pack_info(int stat_only)`
			`{`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`int i, baseobjects = nr_objects - nr_deltas;`
			`unsigned long *chain_histogram = NULL;`

			`if (deepest_delta)`
			`chain_histogram = xcalloc(deepest_delta, sizeof(unsigned long));`

index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`for (i = 0; i < nr_objects; i++) {`
			`struct object_entry *obj = &objects[i];`

index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`if (is_delta_type(obj->type))`
			`chain_histogram[obj->delta_depth - 1]++;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`if (stat_only)`
			`continue;`
			`printf("%s %-6s %lu %lu %"PRIuMAX,`
			`sha1_to_hex(obj->idx.sha1),`
			`typename(obj->real_type), obj->size,`
			`(unsigned long)(obj[1].idx.offset - obj->idx.offset),`
			`(uintmax_t)obj->idx.offset);`
			`if (is_delta_type(obj->type)) {`
			`struct object_entry *bobj = &objects[obj->base_object_no];`
			`printf(" %u %s", obj->delta_depth, sha1_to_hex(bobj->idx.sha1));`
			`}`
			`putchar('\n');`
			`}`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00
			`if (baseobjects)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`printf_ln(Q_("non delta: %d object",`
			`"non delta: %d objects",`
			`baseobjects),`
			`baseobjects);`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`for (i = 0; i < deepest_delta; i++) {`
			`if (!chain_histogram[i])`
			`continue;`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`printf_ln(Q_("chain length = %d: %lu object",`
			`"chain length = %d: %lu objects",`
			`chain_histogram[i]),`
			`i + 1,`
			`chain_histogram[i]);`
index-pack: show histogram when emulating "verify-pack -v" The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:16 +02:00			`}`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`}`

make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`int cmd_index_pack(int argc, const char *argv, const char prefix)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`{`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`int i, fix_thin_pack = 0, verify = 0, stat_only = 0;`
index-pack: work around thread-unsafe pread() Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-25 14:41:41 +01:00			`const char *curr_index;`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`const char index_name = NULL, pack_name = NULL;`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`const char keep_name = NULL, keep_msg = NULL;`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`struct strbuf index_name_buf = STRBUF_INIT,`
			`keep_name_buf = STRBUF_INIT;`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`struct pack_idx_entry **idx_objects;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`struct pack_idx_option opts;`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`unsigned char pack_sha1[20];`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`unsigned foreign_nr = 1; /* zero is a "good" value, assume bad */`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
Let 'git <command> -h' show usage without a git dir There is no need for "git <command> -h" to depend on being inside a repository. Reported by Gerfried Fuchs through http://bugs.debian.org/462557 Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-11-09 16:05:01 +01:00			`if (argc == 2 && !strcmp(argv[1], "-h"))`
			`usage(index_pack_usage);`

rename read_replace_refs to check_replace_refs The semantics of this flag was changed in commit e1111cef23 inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-02-18 12:24:55 +01:00			`check_replace_refs = 0;`
index-pack: Don't follow replace refs. Without this, attempting to index a pack containing objects that have been replaced results in a fatal error that looks like: fatal: SHA1 COLLISION FOUND WITH <replaced-object> ! Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Acked-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-12 16:18:12 +02:00
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`reset_pack_idx_option(&opts);`
			`git_config(git_index_pack_config, &opts);`
Revert "rehabilitate 'git index-pack' inside the object store" Now setup_git_directory_gently behaves sanely even from subdirs of .git, so simplify index-pack by no longer protecting against that. This reverts commit a672ea6ac5a1b876bc7adfe6534b16fa2a32c94b (excluding tests). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-24 13:30:49 +02:00			`if (prefix && chdir(prefix))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("Cannot come back to cwd"));`
make the pack index version configurable It is a good idea to use pack index version 2 all the time since it has proper protection against propagation of certain pack corruptions when repacking which is not possible with index version 1, as demonstrated in test t5302. Hence this config option. The default is still pack index version 1. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-02 04:26:04 +01:00
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`for (i = 1; i < argc; i++) {`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`const char *arg = argv[i];`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
			`if (*arg == '-') {`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!strcmp(arg, "--stdin")) {`
			`from_stdin = 1;`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`} else if (!strcmp(arg, "--fix-thin")) {`
			`fix_thin_pack = 1;`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`} else if (!strcmp(arg, "--strict")) {`
			`strict = 1;`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`do_fsck_object = 1;`
			`} else if (!strcmp(arg, "--check-self-contained-and-connected")) {`
			`strict = 1;`
			`check_self_contained_and_connected = 1;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`} else if (!strcmp(arg, "--verify")) {`
			`verify = 1;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`} else if (!strcmp(arg, "--verify-stat")) {`
			`verify = 1;`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`show_stat = 1;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`} else if (!strcmp(arg, "--verify-stat-only")) {`
			`verify = 1;`
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`show_stat = 1;`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`stat_only = 1;`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`} else if (!strcmp(arg, "--keep")) {`
			`keep_msg = "";`
replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-30 21:55:40 +01:00			`} else if (starts_with(arg, "--keep=")) {`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`keep_msg = arg + 7;`
replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-30 21:55:40 +01:00			`} else if (starts_with(arg, "--threads=")) {`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`char *end;`
			`nr_threads = strtoul(arg+10, &end, 0);`
			`if (!arg[10] \|\| *end \|\| nr_threads < 0)`
			`usage(index_pack_usage);`
			`#ifdef NO_PTHREADS`
			`if (nr_threads != 1)`
i18n: mark more index-pack strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-31 14:13:04 +02:00			`warning(_("no threads support, "`
			`"ignoring %s"), arg);`
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`nr_threads = 1;`
			`#endif`
replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-30 21:55:40 +01:00			`} else if (starts_with(arg, "--pack_header=")) {`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`struct pack_header *hdr;`
			`char *c;`

			`hdr = (struct pack_header *)input_buffer;`
			`hdr->hdr_signature = htonl(PACK_SIGNATURE);`
			`hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));`
			`if (*c != ',')`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));`
			`if (*c)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Allow pack header preprocessing before unpack-objects/index-pack. Some applications which invoke unpack-objects or index-pack --stdin may want to examine the pack header to determine the number of objects contained in the pack and use that value to determine which executable to invoke to handle the rest of the pack stream. However if the caller consumes the pack header from the input stream then its no longer available for unpack-objects or index-pack --stdin, both of which need the version and object count to process the stream. This change introduces --pack_header=ver,cnt as a command line option that the caller can supply to indicate it has already consumed the pack header and what version and object count were found in that header. As this option is only meant for low level applications such as receive-pack we are not documenting it at this time. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-01 23:06:20 +01:00			`input_len = sizeof(*hdr);`
add progress status to index-pack This is more interesting to look at when performing a big fetch. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:32:59 +02:00			`} else if (!strcmp(arg, "-v")) {`
			`verbose = 1;`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`} else if (!strcmp(arg, "-o")) {`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`if (index_name \|\| (i+1) >= argc)`
			`usage(index_pack_usage);`
			`index_name = argv[++i];`
replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-30 21:55:40 +01:00			`} else if (starts_with(arg, "--index-version=")) {`
allow forcing index v2 and 64-bit offset treshold This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 23:32:03 +02:00			`char *c;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts.version = strtoul(arg + 16, &c, 10);`
			`if (opts.version > 2)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
allow forcing index v2 and 64-bit offset treshold This is necessary for testing the new capabilities in some automated way without having an actual 4GB+ pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 23:32:03 +02:00			`if (*c == ',')`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`opts.off32_limit = strtoul(c+1, &c, 0);`
			`if (*c \|\| opts.off32_limit & 0x80000000)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("bad %s"), arg);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`} else`
			`usage(index_pack_usage);`
			`continue;`
			`}`

			`if (pack_name)`
			`usage(index_pack_usage);`
			`pack_name = arg;`
			`}`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!pack_name && !from_stdin)`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`usage(index_pack_usage);`
make index-pack able to complete thin packs. A new flag, --fix-thin, instructs git-index-pack to append any missing objects to a thin pack to make it self contained and indexable. Of course objects missing from the pack must be present elsewhere in the local repository. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 05:28:17 +02:00			`if (fix_thin_pack && !from_stdin)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("--fix-thin cannot be used without --stdin"));`
enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`if (!index_name && pack_name) {`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`size_t len;`
			`if (!strip_suffix(pack_name, ".pack", &len))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("packfile name '%s' does not end with '.pack'"),`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`pack_name);`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`strbuf_add(&index_name_buf, pack_name, len);`
			`strbuf_addstr(&index_name_buf, ".idx");`
			`index_name = index_name_buf.buf;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`}`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`if (keep_msg && !keep_name && pack_name) {`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`size_t len;`
			`if (!strip_suffix(pack_name, ".pack", &len))`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("packfile name '%s' does not end with '.pack'"),`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`pack_name);`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`strbuf_add(&keep_name_buf, pack_name, len);`
			`strbuf_addstr(&keep_name_buf, ".idx");`
			`keep_name = keep_name_buf.buf;`
Teach git-index-pack how to keep a pack file. To prevent a race condition between `index-pack --stdin` and `repack -a -d` where the repack deletes the newly created pack file before any refs are updated to reference objects contained within it we mark the pack file as one that should be kept. This removes it from the list of packs that `repack -a -d` will consider for removal. Callers such as `receive-pack` which want to invoke `index-pack` should use this new --keep option to prevent the newly created pack and index file pair from being deleted before they have finished any related ref updates. Only after all ref updates have been finished should the associated .keep file be removed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 10:41:59 +01:00			`}`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (verify) {`
			`if (!index_name)`
i18n: index-pack: mark strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-23 14:30:29 +02:00			`die(_("--verify with no packfile name given"));`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`read_idx_option(&opts, index_name);`
receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-17 07:04:13 +01:00			`opts.flags \|= WRITE_IDX_VERIFY \| WRITE_IDX_STRICT;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`}`
receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-17 07:04:13 +01:00			`if (strict)`
			`opts.flags \|= WRITE_IDX_STRICT;`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
index-pack: support multithreaded delta resolving This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:55 +02:00			`#ifndef NO_PTHREADS`
			`if (!nr_threads) {`
			`nr_threads = online_cpus();`
			`/* An experiment showed that more threads does not mean faster */`
			`if (nr_threads > 3)`
			`nr_threads = 3;`
			`}`
			`#endif`

enable index-pack streaming capability A new flag, --stdin, allows for a pack to be received over a stream. When this flag is provided, the pack content is written to either the named pack file or directly to the object repository under the same name as produced by git-repack. The pack index is written as well with the corresponding base name, unless the index name is overriden with -o. With this patch, git-index-pack could be used instead of git-unpack-objects when fetching remote objects but only with non "thin" packs for now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-23 20:50:18 +02:00			`curr_pack = open_pack_file(pack_name);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`parse_pack_header();`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));`
			`deltas = xcalloc(nr_objects, sizeof(struct delta_entry));`
index-pack: use fixup_pack_header_footer()'s validation mode When completing a thin pack, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written for both: the original pack and the appended objects. To do so, a couple write_or_die() calls were converted to sha1write() which has the advantage of doing some buffering as well as handling SHA1 and CRC32 checksum already. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:01 +02:00			`parse_pack_objects(pack_sha1);`
index-pack: restructure pack processing into three main functions The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 14:31:54 +02:00			`resolve_deltas();`
			`conclude_pack(fix_thin_pack, curr_pack, pack_sha1);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(deltas);`
index-pack: introduce checking mode Adds strict option, which bails out if the pack would introduces broken object or links in the repository. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-25 22:46:12 +01:00			`if (strict)`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`foreign_nr = check_objects();`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00
index-pack: protect deepest_delta in multithread code deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-19 14:01:15 +01:00			`if (show_stat)`
index-pack: start learning to emulate "verify-pack -v" The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-04 00:32:15 +02:00			`show_pack_info(stat_only);`

Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));`
			`for (i = 0; i < nr_objects; i++)`
			`idx_objects[i] = &objects[i].idx;`
write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-26 00:43:25 +01:00			`curr_index = write_idx_file(index_name, idx_objects, nr_objects, &opts, pack_sha1);`
Unify write_index_file functions This patch unifies the write_index_file functions in builtin-pack-objects.c and index-pack.c. As the name "index" is overloaded in git, move in the direction of using "idx" and "pack idx" when refering to the pack index. There should be no change in functionality. Signed-off-by: Geert Bosch <bosch@gnat.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-06-01 21:18:05 +02:00			`free(idx_objects);`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (!verify)`
			`final(pack_name, curr_pack,`
			`index_name, curr_index,`
			`keep_name, keep_msg,`
			`pack_sha1);`
			`else`
			`close(input_fd);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`free(objects);`
index-pack: use strip_suffix to avoid magic numbers We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-30 18:59:10 +02:00			`strbuf_release(&index_name_buf);`
			`strbuf_release(&keep_name_buf);`
fix for more minor memory leaks Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:50 +02:00			`if (pack_name == NULL)`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`free((void *) curr_pack);`
fix for more minor memory leaks Now that some pointers have lost their const attribute, we can free their associated memory when done with them. This is more a correctness issue about the rule for freeing those pointers which isn't completely trivial more than the leak itself which didn't matter as the program is exiting anyway. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:50 +02:00			`if (index_name == NULL)`
make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-22 16:55:19 +01:00			`free((void *) curr_index);`
Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`/*`
			`* Let the caller know this pack is not self contained`
			`*/`
			`if (check_self_contained_and_connected && foreign_nr)`
			`return 1;`

Add git-index-pack utility git-index-pack builds a pack index file for an existing packed archive. With this utility a packed archive which was transferred without the corresponding pack index can be added to objects/pack/ without repacking. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 21:01:31 +02:00			`return 0;`
			`}`