mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 06:03:44 +01:00

953 lines

23 KiB

C

Raw Normal View History

fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`#include "cache.h"`
			`#include "refs.h"`
			`#include "pkt-line.h"`
			`#include "commit.h"`
			`#include "tag.h"`
			`#include "exec_cmd.h"`
			`#include "pack.h"`
			`#include "sideband.h"`
			`#include "fetch-pack.h"`
			`#include "remote.h"`
			`#include "run-command.h"`
cache.h: move remote/connect API out of it The definition of "struct ref" in "cache.h", a header file so central to the system, always confused me. This structure is not about the local ref used by sha1-name API to name local objects. It is what refspecs are expanded into, after finding out what refs the other side has, to define what refs are updated after object transfer succeeds to what values. It belongs to "remote.h" together with "struct refspec". While we are at it, also move the types and functions related to the Git transport connection to a new header file connect.h Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-08 22:56:53 +02:00			`#include "connect.h"`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`#include "transport.h"`
			`#include "version.h"`
fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`#include "prio-queue.h"`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`static int transfer_unpack_limit = -1;`
			`static int fetch_unpack_limit = -1;`
			`static int unpack_limit = 100;`
			`static int prefer_ofs_delta = 1;`
			`static int no_done;`
			`static int fetch_fsck_objects = -1;`
			`static int transfer_fsck_objects = -1;`
			`static int agent_supported;`
fetch-pack: prepare updated shallow file before fetching the pack index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:15 +02:00			`static struct lock_file shallow_lock;`
			`static const char *alternate_shallow_file;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`#define COMPLETE (1U << 0)`
			`#define COMMON (1U << 1)`
			`#define COMMON_REF (1U << 2)`
			`#define SEEN (1U << 3)`
			`#define POPPED (1U << 4)`

			`static int marked;`

			`/*`
			`* After sending this many "have"s if we do not get any new ACK , we`
			`* give up traversing our history.`
			`*/`
			`#define MAX_IN_VAIN 256`

fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`static struct prio_queue rev_list = { compare_commits_by_commit_date };`
fetch: fetch objects by their exact SHA-1 object names Teach "git fetch" to accept an exact SHA-1 object name the user may obtain out of band on the LHS of a pathspec, and send it on a "want" message when the server side advertises the allow-tip-sha1-in-want capability. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`static int non_common_revs, multi_ack, use_sideband, allow_tip_sha1_in_want;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`static void rev_list_push(struct commit *commit, int mark)`
			`{`
			`if (!(commit->object.flags & mark)) {`
			`commit->object.flags \|= mark;`

			`if (!(commit->object.parsed))`
			`if (parse_commit(commit))`
			`return;`

fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`prio_queue_put(&rev_list, commit);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`if (!(commit->object.flags & COMMON))`
			`non_common_revs++;`
			`}`
			`}`

			`static int rev_list_insert_ref(const char refname, const unsigned char sha1, int flag, void *cb_data)`
			`{`
			`struct object *o = deref_tag(parse_object(sha1), refname, 0);`

			`if (o && o->type == OBJ_COMMIT)`
			`rev_list_push((struct commit *)o, SEEN);`

			`return 0;`
			`}`

			`static int clear_marks(const char refname, const unsigned char sha1, int flag, void *cb_data)`
			`{`
			`struct object *o = deref_tag(parse_object(sha1), refname, 0);`

			`if (o && o->type == OBJ_COMMIT)`
			`clear_commit_marks((struct commit *)o,`
			`COMMON \| COMMON_REF \| SEEN \| POPPED);`
			`return 0;`
			`}`

			`/*`
			`This function marks a rev and its ancestors as common.`
			`In some cases, it is desirable to mark only the ancestors (for example`
			`when only the server does not yet know that they are common).`
			`*/`

			`static void mark_common(struct commit *commit,`
			`int ancestors_only, int dont_parse)`
			`{`
			`if (commit != NULL && !(commit->object.flags & COMMON)) {`
			`struct object o = (struct object )commit;`

			`if (!ancestors_only)`
			`o->flags \|= COMMON;`

			`if (!(o->flags & SEEN))`
			`rev_list_push(commit, SEEN);`
			`else {`
			`struct commit_list *parents;`

			`if (!ancestors_only && !(o->flags & POPPED))`
			`non_common_revs--;`
			`if (!o->parsed && !dont_parse)`
			`if (parse_commit(commit))`
			`return;`

			`for (parents = commit->parents;`
			`parents;`
			`parents = parents->next)`
			`mark_common(parents->item, 0, dont_parse);`
			`}`
			`}`
			`}`

			`/*`
			`Get the next rev to send, ignoring the common.`
			`*/`

			`static const unsigned char *get_rev(void)`
			`{`
			`struct commit *commit = NULL;`

			`while (commit == NULL) {`
			`unsigned int mark;`
			`struct commit_list *parents;`

fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`if (rev_list.nr == 0 \|\| non_common_revs == 0)`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`return NULL;`

fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`commit = prio_queue_get(&rev_list);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (!commit->object.parsed)`
			`parse_commit(commit);`
			`parents = commit->parents;`

			`commit->object.flags \|= POPPED;`
			`if (!(commit->object.flags & COMMON))`
			`non_common_revs--;`

			`if (commit->object.flags & COMMON) {`
			`/* do not send "have", and ignore ancestors */`
			`commit = NULL;`
			`mark = COMMON \| SEEN;`
			`} else if (commit->object.flags & COMMON_REF)`
			`/* send "have", and ignore ancestors */`
			`mark = COMMON \| SEEN;`
			`else`
			`/* send "have", also for its ancestors */`
			`mark = SEEN;`

			`while (parents) {`
			`if (!(parents->item->object.flags & SEEN))`
			`rev_list_push(parents->item, mark);`
			`if (mark & COMMON)`
			`mark_common(parents->item, 1, 0);`
			`parents = parents->next;`
			`}`
			`}`

			`return commit->object.sha1;`
			`}`

			`enum ack_type {`
			`NAK = 0,`
			`ACK,`
			`ACK_continue,`
			`ACK_common,`
			`ACK_ready`
			`};`

			`static void consume_shallow_list(struct fetch_pack_args *args, int fd)`
			`{`
			`if (args->stateless_rpc && args->depth > 0) {`
			`/* If we sent a depth we will get back "duplicate"`
			`* shallow and unshallow commands every time there`
			`* is a block of have lines exchanged.`
			`*/`
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:02:57 +01:00			`char *line;`
			`while ((line = packet_read_line(fd, NULL))) {`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (!prefixcmp(line, "shallow "))`
			`continue;`
			`if (!prefixcmp(line, "unshallow "))`
			`continue;`
			`die("git fetch-pack: expected shallow list");`
			`}`
			`}`
			`}`

			`static enum ack_type get_ack(int fd, unsigned char *result_sha1)`
			`{`
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:02:57 +01:00			`int len;`
			`char *line = packet_read_line(fd, &len);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`if (!len)`
			`die("git fetch-pack: expected ACK/NAK, got EOF");`
			`if (!strcmp(line, "NAK"))`
			`return NAK;`
			`if (!prefixcmp(line, "ACK ")) {`
			`if (!get_sha1_hex(line+4, result_sha1)) {`
fetch-pack: fix out-of-bounds buffer offset in get_ack When we read acks from the remote, we expect either: ACK <sha1> or ACK <sha1> <multi-ack-flag> We parse the "ACK <sha1>" bit from the line, and then start looking for the flag strings at "line+45"; if we don't have them, we assume it's of the first type. But if we do have the first type, then line+45 is not necessarily inside our string at all! It turns out that this works most of the time due to the way we parse the packets. They should come in with a newline, and packet_read puts an extra NUL into the buffer, so we end up with: ACK <sha1>\n\0 with the newline at offset 44 and the NUL at offset 45. We then strip the newline, putting a NUL at offset 44. So when we look at "line+45", we are looking past the end of our string; but it's OK, because we hit the terminator from the original string. This breaks down, however, if the other side does not terminate their packets with a newline. In that case, our packet is one character shorter, and we start looking through uninitialized memory for the flag. No known implementation sends such a packet, so it has never come up in practice. This patch tightens the check by looking for a short, flagless ACK before trying to parse the flag. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:00:28 +01:00			`if (len < 45)`
			`return ACK;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (strstr(line+45, "continue"))`
			`return ACK_continue;`
			`if (strstr(line+45, "common"))`
			`return ACK_common;`
			`if (strstr(line+45, "ready"))`
			`return ACK_ready;`
			`return ACK;`
			`}`
			`}`
			`die("git fetch_pack: expected ACK/NAK, got '%s'", line);`
			`}`

			`static void send_request(struct fetch_pack_args *args,`
			`int fd, struct strbuf *buf)`
			`{`
			`if (args->stateless_rpc) {`
			`send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);`
			`packet_flush(fd);`
			`} else`
pkt-line: drop safe_write function This is just write_or_die by another name. The one distinction is that write_or_die will treat EPIPE specially by suppressing error messages. That's fine, as we die by SIGPIPE anyway (and in the off chance that it is disabled, write_or_die will simulate it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:01:56 +01:00			`write_or_die(fd, buf->buf, buf->len);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`

			`static void insert_one_alternate_ref(const struct ref ref, void unused)`
			`{`
			`rev_list_insert_ref(NULL, ref->old_sha1, 0, NULL);`
			`}`

			`#define INITIAL_FLUSH 16`
			`#define PIPESAFE_FLUSH 32`
			`#define LARGE_FLUSH 1024`

			`static int next_flush(struct fetch_pack_args *args, int count)`
			`{`
			`int flush_limit = args->stateless_rpc ? LARGE_FLUSH : PIPESAFE_FLUSH;`

			`if (count < flush_limit)`
			`count <<= 1;`
			`else`
			`count += flush_limit;`
			`return count;`
			`}`

			`static int find_common(struct fetch_pack_args *args,`
			`int fd[2], unsigned char *result_sha1,`
			`struct ref *refs)`
			`{`
			`int fetching;`
			`int count = 0, flushes = 0, flush_at = INITIAL_FLUSH, retval;`
			`const unsigned char *sha1;`
			`unsigned in_vain = 0;`
			`int got_continue = 0;`
			`int got_ready = 0;`
			`struct strbuf req_buf = STRBUF_INIT;`
			`size_t state_len = 0;`

			`if (args->stateless_rpc && multi_ack == 1)`
			`die("--stateless-rpc requires multi_ack_detailed");`
			`if (marked)`
			`for_each_ref(clear_marks, NULL);`
			`marked = 1;`

			`for_each_ref(rev_list_insert_ref, NULL);`
			`for_each_alternate_ref(insert_one_alternate_ref, NULL);`

			`fetching = 0;`
			`for ( ; refs ; refs = refs->next) {`
			`unsigned char *remote = refs->old_sha1;`
			`const char *remote_hex;`
			`struct object *o;`

			`/*`
			`* If that object is complete (i.e. it is an ancestor of a`
			`* local ref), we tell them we have it but do not have to`
			`* tell them about its ancestors, which they already know`
			`* about.`
			`*`
			`* We use lookup_object here because we are only`
			`* interested in the case we know the object is`
			`* reachable and we have already scanned it.`
			`*/`
			`if (((o = lookup_object(remote)) != NULL) &&`
			`(o->flags & COMPLETE)) {`
			`continue;`
			`}`

			`remote_hex = sha1_to_hex(remote);`
			`if (!fetching) {`
			`struct strbuf c = STRBUF_INIT;`
			`if (multi_ack == 2) strbuf_addstr(&c, " multi_ack_detailed");`
			`if (multi_ack == 1) strbuf_addstr(&c, " multi_ack");`
			`if (no_done) strbuf_addstr(&c, " no-done");`
			`if (use_sideband == 2) strbuf_addstr(&c, " side-band-64k");`
			`if (use_sideband == 1) strbuf_addstr(&c, " side-band");`
			`if (args->use_thin_pack) strbuf_addstr(&c, " thin-pack");`
			`if (args->no_progress) strbuf_addstr(&c, " no-progress");`
			`if (args->include_tag) strbuf_addstr(&c, " include-tag");`
			`if (prefer_ofs_delta) strbuf_addstr(&c, " ofs-delta");`
			`if (agent_supported) strbuf_addf(&c, " agent=%s",`
			`git_user_agent_sanitized());`
			`packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);`
			`strbuf_release(&c);`
			`} else`
			`packet_buf_write(&req_buf, "want %s\n", remote_hex);`
			`fetching++;`
			`}`

			`if (!fetching) {`
			`strbuf_release(&req_buf);`
			`packet_flush(fd[1]);`
			`return 1;`
			`}`

			`if (is_repository_shallow())`
			`write_shallow_commits(&req_buf, 1);`
			`if (args->depth > 0)`
			`packet_buf_write(&req_buf, "deepen %d", args->depth);`
			`packet_buf_flush(&req_buf);`
			`state_len = req_buf.len;`

			`if (args->depth > 0) {`
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:02:57 +01:00			`char *line;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`unsigned char sha1[20];`

			`send_request(args, fd[1], &req_buf);`
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-20 21:02:57 +01:00			`while ((line = packet_read_line(fd[0], NULL))) {`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (!prefixcmp(line, "shallow ")) {`
			`if (get_sha1_hex(line + 8, sha1))`
			`die("invalid shallow line: %s", line);`
			`register_shallow(sha1);`
			`continue;`
			`}`
			`if (!prefixcmp(line, "unshallow ")) {`
			`if (get_sha1_hex(line + 10, sha1))`
			`die("invalid unshallow line: %s", line);`
			`if (!lookup_object(sha1))`
			`die("object not found: %s", line);`
			`/* make sure that it is parsed as shallow */`
			`if (!parse_object(sha1))`
			`die("error in object: %s", line);`
			`if (unregister_shallow(sha1))`
			`die("no shallow found: %s", line);`
			`continue;`
			`}`
			`die("expected shallow/unshallow, got %s", line);`
			`}`
			`} else if (!args->stateless_rpc)`
			`send_request(args, fd[1], &req_buf);`

			`if (!args->stateless_rpc) {`
			`/* If we aren't using the stateless-rpc interface`
			`* we don't need to retain the headers.`
			`*/`
			`strbuf_setlen(&req_buf, 0);`
			`state_len = 0;`
			`}`

			`flushes = 0;`
			`retval = -1;`
			`while ((sha1 = get_rev())) {`
			`packet_buf_write(&req_buf, "have %s\n", sha1_to_hex(sha1));`
			`if (args->verbose)`
			`fprintf(stderr, "have %s\n", sha1_to_hex(sha1));`
			`in_vain++;`
			`if (flush_at <= ++count) {`
			`int ack;`

			`packet_buf_flush(&req_buf);`
			`send_request(args, fd[1], &req_buf);`
			`strbuf_setlen(&req_buf, state_len);`
			`flushes++;`
			`flush_at = next_flush(args, count);`

			`/*`
			`* We keep one window "ahead" of the other side, and`
			`* will wait for an ACK only on the next one`
			`*/`
			`if (!args->stateless_rpc && count == INITIAL_FLUSH)`
			`continue;`

			`consume_shallow_list(args, fd[0]);`
			`do {`
			`ack = get_ack(fd[0], result_sha1);`
			`if (args->verbose && ack)`
			`fprintf(stderr, "got ack %d %s\n", ack,`
			`sha1_to_hex(result_sha1));`
			`switch (ack) {`
			`case ACK:`
			`flushes = 0;`
			`multi_ack = 0;`
			`retval = 0;`
			`goto done;`
			`case ACK_common:`
			`case ACK_ready:`
			`case ACK_continue: {`
			`struct commit *commit =`
			`lookup_commit(result_sha1);`
			`if (!commit)`
			`die("invalid commit %s", sha1_to_hex(result_sha1));`
			`if (args->stateless_rpc`
			`&& ack == ACK_common`
			`&& !(commit->object.flags & COMMON)) {`
			`/* We need to replay the have for this object`
			`* on the next RPC request so the peer knows`
			`* it is in common with us.`
			`*/`
			`const char *hex = sha1_to_hex(result_sha1);`
			`packet_buf_write(&req_buf, "have %s\n", hex);`
			`state_len = req_buf.len;`
			`}`
			`mark_common(commit, 0, 1);`
			`retval = 0;`
			`in_vain = 0;`
			`got_continue = 1;`
			`if (ack == ACK_ready) {`
fetch-pack: avoid quadratic behavior in rev_list_push When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:24:21 +02:00			`clear_prio_queue(&rev_list);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`got_ready = 1;`
			`}`
			`break;`
			`}`
			`}`
			`} while (ack);`
			`flushes--;`
			`if (got_continue && MAX_IN_VAIN < in_vain) {`
			`if (args->verbose)`
			`fprintf(stderr, "giving up\n");`
			`break; /* give up */`
			`}`
			`}`
			`}`
			`done:`
			`if (!got_ready \|\| !no_done) {`
			`packet_buf_write(&req_buf, "done\n");`
			`send_request(args, fd[1], &req_buf);`
			`}`
			`if (args->verbose)`
			`fprintf(stderr, "done\n");`
			`if (retval != 0) {`
			`multi_ack = 0;`
			`flushes++;`
			`}`
			`strbuf_release(&req_buf);`

			`consume_shallow_list(args, fd[0]);`
			`while (flushes \|\| multi_ack) {`
			`int ack = get_ack(fd[0], result_sha1);`
			`if (ack) {`
			`if (args->verbose)`
			`fprintf(stderr, "got ack (%d) %s\n", ack,`
			`sha1_to_hex(result_sha1));`
			`if (ack == ACK)`
			`return 0;`
			`multi_ack = 1;`
			`continue;`
			`}`
			`flushes--;`
			`}`
			`/* it is no error to fetch into a completely empty repo */`
			`return count ? retval : 0;`
			`}`

			`static struct commit_list *complete;`

			`static int mark_complete(const char refname, const unsigned char sha1, int flag, void *cb_data)`
			`{`
			`struct object *o = parse_object(sha1);`

			`while (o && o->type == OBJ_TAG) {`
			`struct tag t = (struct tag ) o;`
			`if (!t->tagged)`
			`break; /* broken repository */`
			`o->flags \|= COMPLETE;`
			`o = parse_object(t->tagged->sha1);`
			`}`
			`if (o && o->type == OBJ_COMMIT) {`
			`struct commit commit = (struct commit )o;`
			`if (!(commit->object.flags & COMPLETE)) {`
			`commit->object.flags \|= COMPLETE;`
fetch-pack: avoid quadratic list insertion in mark_complete We insert the commit pointed to by each ref one-by-one into the "complete" commit_list using insert_by_date. Because each insertion is O(n), we end up with O(n^2) behavior. This typically doesn't matter, because the number of refs is reasonably small. And even if there are a lot of refs, they often point to a smaller set of objects (in which case the optimization in commit ea5f220 keeps our "n" small). However, in pathological repositories (hundreds of thousands of refs, each pointing to a unique commit), this quadratic behavior can make a difference. Since we do not care about the list order until we have finished building it, we can simply keep it unsorted during the insertion phase, then sort it afterwards. On a repository like the one described above, this dropped the time to do a no-op fetch from 2.0s to 1.7s. On normal repositories, it probably does not matter at all, but it does not hurt to protect ourselves from pathological cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:16:23 +02:00			`commit_list_insert(commit, &complete);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`
			`}`
			`return 0;`
			`}`

			`static void mark_recent_complete_commits(struct fetch_pack_args *args,`
			`unsigned long cutoff)`
			`{`
			`while (complete && cutoff <= complete->item->date) {`
			`if (args->verbose)`
			`fprintf(stderr, "Marking %s as complete\n",`
			`sha1_to_hex(complete->item->object.sha1));`
			`pop_most_recent_commit(&complete, COMPLETE);`
			`}`
			`}`

			`static void filter_refs(struct fetch_pack_args *args,`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`struct ref **refs,`
			`struct ref **sought, int nr_sought)`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`{`
			`struct ref *newlist = NULL;`
			`struct ref **newtail = &newlist;`
			`struct ref ref, next;`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`int i;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`i = 0;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`for (ref = *refs; ref; ref = next) {`
			`int keep = 0;`
			`next = ref->next;`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (!memcmp(ref->name, "refs/", 5) &&`
			`check_refname_format(ref->name + 5, 0))`
			`; /* trash */`
			`else {`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`while (i < nr_sought) {`
			`int cmp = strcmp(ref->name, sought[i]->name);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (cmp < 0)`
			`break; /* definitely do not have it */`
			`else if (cmp == 0) {`
			`keep = 1; /* definitely have it */`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`sought[i]->matched = 1;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`i++;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`
			`}`

fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`if (!keep && args->fetch_all &&`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`(!args->depth \|\| prefixcmp(ref->name, "refs/tags/")))`
			`keep = 1;`

			`if (keep) {`
			`*newtail = ref;`
			`ref->next = NULL;`
			`newtail = &ref->next;`
			`} else {`
			`free(ref);`
			`}`
			`}`

fetch: fetch objects by their exact SHA-1 object names Teach "git fetch" to accept an exact SHA-1 object name the user may obtain out of band on the LHS of a pathspec, and send it on a "want" message when the server side advertises the allow-tip-sha1-in-want capability. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`/* Append unmatched requests to the list */`
			`if (allow_tip_sha1_in_want) {`
			`for (i = 0; i < nr_sought; i++) {`
			`ref = sought[i];`
			`if (ref->matched)`
			`continue;`
			`if (get_sha1_hex(ref->name, ref->old_sha1))`
			`continue;`

			`ref->matched = 1;`
			`*newtail = ref;`
			`ref->next = NULL;`
			`newtail = &ref->next;`
			`}`
			`}`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`*refs = newlist;`
			`}`

			`static void mark_alternate_complete(const struct ref ref, void unused)`
			`{`
			`mark_complete(NULL, ref->old_sha1, 0, NULL);`
			`}`

			`static int everything_local(struct fetch_pack_args *args,`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`struct ref **refs,`
			`struct ref **sought, int nr_sought)`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`{`
			`struct ref *ref;`
			`int retval;`
			`unsigned long cutoff = 0;`

			`save_commit_buffer = 0;`

			`for (ref = *refs; ref; ref = ref->next) {`
			`struct object *o;`

Merge branch 'jk/maint-gc-auto-after-fetch' into jk/gc-auto-after-fetch * jk/maint-gc-auto-after-fetch: fetch-pack: avoid repeatedly re-scanning pack directory fetch: run gc --auto after fetching 2013-01-27 04:42:09 +01:00			`if (!has_sha1_file(ref->old_sha1))`
			`continue;`

fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`o = parse_object(ref->old_sha1);`
			`if (!o)`
			`continue;`

			`/* We already have it -- which may mean that we were`
			`* in sync with the other side at some time after`
			`* that (it is OK if we guess wrong here).`
			`*/`
			`if (o->type == OBJ_COMMIT) {`
			`struct commit commit = (struct commit )o;`
			`if (!cutoff \|\| cutoff < commit->date)`
			`cutoff = commit->date;`
			`}`
			`}`

			`if (!args->depth) {`
			`for_each_ref(mark_complete, NULL);`
			`for_each_alternate_ref(mark_alternate_complete, NULL);`
fetch-pack: avoid quadratic list insertion in mark_complete We insert the commit pointed to by each ref one-by-one into the "complete" commit_list using insert_by_date. Because each insertion is O(n), we end up with O(n^2) behavior. This typically doesn't matter, because the number of refs is reasonably small. And even if there are a lot of refs, they often point to a smaller set of objects (in which case the optimization in commit ea5f220 keeps our "n" small). However, in pathological repositories (hundreds of thousands of refs, each pointing to a unique commit), this quadratic behavior can make a difference. Since we do not care about the list order until we have finished building it, we can simply keep it unsorted during the insertion phase, then sort it afterwards. On a repository like the one described above, this dropped the time to do a no-op fetch from 2.0s to 1.7s. On normal repositories, it probably does not matter at all, but it does not hurt to protect ourselves from pathological cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-02 08:16:23 +02:00			`commit_list_sort_by_date(&complete);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (cutoff)`
			`mark_recent_complete_commits(args, cutoff);`
			`}`

			`/*`
			`* Mark all complete remote refs as common refs.`
			`* Don't mark them common yet; the server has to be told so first.`
			`*/`
			`for (ref = *refs; ref; ref = ref->next) {`
			`struct object *o = deref_tag(lookup_object(ref->old_sha1),`
			`NULL, 0);`

			`if (!o \|\| o->type != OBJ_COMMIT \|\| !(o->flags & COMPLETE))`
			`continue;`

			`if (!(o->flags & SEEN)) {`
			`rev_list_push((struct commit *)o, COMMON_REF \| SEEN);`

			`mark_common((struct commit *)o, 1, 1);`
			`}`
			`}`

fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`filter_refs(args, refs, sought, nr_sought);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`for (retval = 1, ref = *refs; ref ; ref = ref->next) {`
			`const unsigned char *remote = ref->old_sha1;`
			`unsigned char local[20];`
			`struct object *o;`

			`o = lookup_object(remote);`
			`if (!o \|\| !(o->flags & COMPLETE)) {`
			`retval = 0;`
			`if (!args->verbose)`
			`continue;`
			`fprintf(stderr,`
			`"want %s (%s)\n", sha1_to_hex(remote),`
			`ref->name);`
			`continue;`
			`}`

			`hashcpy(ref->new_sha1, local);`
			`if (!args->verbose)`
			`continue;`
			`fprintf(stderr,`
			`"already have %s (%s)\n", sha1_to_hex(remote),`
			`ref->name);`
			`}`
			`return retval;`
			`}`

			`static int sideband_demux(int in, int out, void *data)`
			`{`
			`int *xd = data;`

			`int ret = recv_sideband("fetch-pack", xd[0], out);`
			`close(out);`
			`return ret;`
			`}`

			`static int get_pack(struct fetch_pack_args *args,`
			`int xd[2], char **pack_lockfile)`
			`{`
			`struct async demux;`
fetch-pack: prepare updated shallow file before fetching the pack index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:15 +02:00			`const char *argv[22];`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`char keep_arg[256];`
			`char hdr_arg[256];`
fetch-pack.c: show correct command name that fails When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-18 15:41:18 +02:00			`const char *av, cmd_name;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`int do_keep = args->keep_pack;`
			`struct child_process cmd;`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`int ret;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`memset(&demux, 0, sizeof(demux));`
			`if (use_sideband) {`
			`/* xd[] is talking with upload-pack; subprocess reads from`
			`* xd[0], spits out band#2 to stderr, and feeds us band#1`
			`* through demux->out.`
			`*/`
			`demux.proc = sideband_demux;`
			`demux.data = xd;`
			`demux.out = -1;`
			`if (start_async(&demux))`
			`die("fetch-pack: unable to fork off sideband"`
			`" demultiplexer");`
			`}`
			`else`
			`demux.out = xd[0];`

			`memset(&cmd, 0, sizeof(cmd));`
			`cmd.argv = argv;`
			`av = argv;`
			`*hdr_arg = 0;`
			`if (!args->keep_pack && unpack_limit) {`
			`struct pack_header header;`

			`if (read_pack_header(demux.out, &header))`
			`die("protocol error: bad pack header");`
			`snprintf(hdr_arg, sizeof(hdr_arg),`
			`"--pack_header=%"PRIu32",%"PRIu32,`
			`ntohl(header.hdr_version), ntohl(header.hdr_entries));`
			`if (ntohl(header.hdr_entries) < unpack_limit)`
			`do_keep = 0;`
			`else`
			`do_keep = 1;`
			`}`

fetch-pack: prepare updated shallow file before fetching the pack index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:15 +02:00			`if (alternate_shallow_file) {`
			`*av++ = "--shallow-file";`
			`*av++ = alternate_shallow_file;`
			`}`

fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (do_keep) {`
			`if (pack_lockfile)`
			`cmd.out = -1;`
fetch-pack.c: show correct command name that fails When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-18 15:41:18 +02:00			`*av++ = cmd_name = "index-pack";`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`*av++ = "--stdin";`
			`if (!args->quiet && !args->no_progress)`
			`*av++ = "-v";`
			`if (args->use_thin_pack)`
			`*av++ = "--fix-thin";`
			`if (args->lock_pack \|\| unpack_limit) {`
			`int s = sprintf(keep_arg,`
			`"--keep=fetch-pack %"PRIuMAX " on ", (uintmax_t) getpid());`
			`if (gethostname(keep_arg + s, sizeof(keep_arg) - s))`
			`strcpy(keep_arg + s, "localhost");`
			`*av++ = keep_arg;`
			`}`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`if (args->check_self_contained_and_connected)`
			`*av++ = "--check-self-contained-and-connected";`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`
			`else {`
fetch-pack.c: show correct command name that fails When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-18 15:41:18 +02:00			`*av++ = cmd_name = "unpack-objects";`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (args->quiet \|\| args->no_progress)`
			`*av++ = "-q";`
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`args->check_self_contained_and_connected = 0;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`
			`if (*hdr_arg)`
			`*av++ = hdr_arg;`
			`if (fetch_fsck_objects >= 0`
			`? fetch_fsck_objects`
			`: transfer_fsck_objects >= 0`
			`? transfer_fsck_objects`
			`: 0)`
			`*av++ = "--strict";`
			`*av++ = NULL;`

			`cmd.in = demux.out;`
			`cmd.git_cmd = 1;`
			`if (start_command(&cmd))`
fetch-pack.c: show correct command name that fails When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-18 15:41:18 +02:00			`die("fetch-pack: unable to fork off %s", cmd_name);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (do_keep && pack_lockfile) {`
			`*pack_lockfile = index_pack_lockfile(cmd.out);`
			`close(cmd.out);`
			`}`

clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:17 +02:00			`ret = finish_command(&cmd);`
			`if (!ret \|\| (args->check_self_contained_and_connected && ret == 1))`
			`args->self_contained_and_connected =`
			`args->check_self_contained_and_connected &&`
			`ret == 0;`
			`else`
fetch-pack.c: show correct command name that fails When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-18 15:41:18 +02:00			`die("%s failed", cmd_name);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (use_sideband && finish_async(&demux))`
			`die("error in sideband demultiplexer");`
			`return 0;`
			`}`

fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`static int cmp_ref_by_name(const void a_, const void b_)`
			`{`
			`const struct ref a = ((const struct ref **)a_);`
			`const struct ref b = ((const struct ref **)b_);`
			`return strcmp(a->name, b->name);`
			`}`

fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`static struct ref do_fetch_pack(struct fetch_pack_args args,`
			`int fd[2],`
			`const struct ref *orig_ref,`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`struct ref **sought, int nr_sought,`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`char **pack_lockfile)`
			`{`
			`struct ref *ref = copy_ref_list(orig_ref);`
			`unsigned char sha1[20];`
			`const char *agent_feature;`
			`int agent_len;`

			`sort_ref_list(&ref, ref_compare_name);`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`qsort(sought, nr_sought, sizeof(*sought), cmp_ref_by_name);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`if (is_repository_shallow() && !server_supports("shallow"))`
			`die("Server does not support shallow clients");`
			`if (server_supports("multi_ack_detailed")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports multi_ack_detailed\n");`
			`multi_ack = 2;`
			`if (server_supports("no-done")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports no-done\n");`
			`if (args->stateless_rpc)`
			`no_done = 1;`
			`}`
			`}`
			`else if (server_supports("multi_ack")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports multi_ack\n");`
			`multi_ack = 1;`
			`}`
			`if (server_supports("side-band-64k")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports side-band-64k\n");`
			`use_sideband = 2;`
			`}`
			`else if (server_supports("side-band")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports side-band\n");`
			`use_sideband = 1;`
			`}`
fetch: fetch objects by their exact SHA-1 object names Teach "git fetch" to accept an exact SHA-1 object name the user may obtain out of band on the LHS of a pathspec, and send it on a "want" message when the server side advertises the allow-tip-sha1-in-want capability. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`if (server_supports("allow-tip-sha1-in-want")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports allow-tip-sha1-in-want\n");`
			`allow_tip_sha1_in_want = 1;`
			`}`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (!server_supports("thin-pack"))`
			`args->use_thin_pack = 0;`
			`if (!server_supports("no-progress"))`
			`args->no_progress = 0;`
			`if (!server_supports("include-tag"))`
			`args->include_tag = 0;`
			`if (server_supports("ofs-delta")) {`
			`if (args->verbose)`
			`fprintf(stderr, "Server supports ofs-delta\n");`
			`} else`
			`prefer_ofs_delta = 0;`

			`if ((agent_feature = server_feature_value("agent", &agent_len))) {`
			`agent_supported = 1;`
			`if (args->verbose && agent_len)`
			`fprintf(stderr, "Server version is %.*s\n",`
			`agent_len, agent_feature);`
			`}`

fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`if (everything_local(args, &ref, sought, nr_sought)) {`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`packet_flush(fd[1]);`
			`goto all_done;`
			`}`
			`if (find_common(args, fd, sha1, ref) < 0)`
			`if (!args->keep_pack)`
			`/* When cloning, it is not unusual to have`
			`* no common commit.`
			`*/`
			`warning("no common commits");`

			`if (args->stateless_rpc)`
			`packet_flush(fd[1]);`
fetch-pack: prepare updated shallow file before fetching the pack index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:15 +02:00			`if (args->depth > 0)`
move setup_alternate_shallow and write_shallow_commits to shallow.c Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:02 +02:00			`setup_alternate_shallow(&shallow_lock, &alternate_shallow_file);`
fetch-pack: do not remove .git/shallow file when --depth is not specified fetch_pack() can remove .git/shallow file when a shallow repository becomes a full one again. This behavior is triggered incorrectly when tags are also fetched because fetch_pack() will be called twice. At the first fetch_pack() call: - shallow_lock is set up - alternate_shallow_file points to shallow_lock.filename, which is "shallow.lock" - commit_lock_file is called, which sets shallow_lock.filename to "". alternate_shallow_file also becomes "" because it points to the same memory. At the second call, setup_alternate_shallow() is not called and alternate_shallow_file remains "". It's mistaken as unshallow case and .git/shallow is removed. The end result is a broken repository. Fix this by always initializing alternate_shallow_file when fetch_pack() is called. As an extra measure, check if args->depth > 0 before commit/rollback shallow file. Reported-by: Kacper Kornet <kornet@camk.edu.pl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-26 04:17:26 +02:00			`else`
			`alternate_shallow_file = NULL;`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`if (get_pack(args, fd, pack_lockfile))`
			`die("git fetch-pack: fetch failed.");`

			`all_done:`
			`return ref;`
			`}`

			`static int fetch_pack_config(const char var, const char value, void *cb)`
			`{`
			`if (strcmp(var, "fetch.unpacklimit") == 0) {`
			`fetch_unpack_limit = git_config_int(var, value);`
			`return 0;`
			`}`

			`if (strcmp(var, "transfer.unpacklimit") == 0) {`
			`transfer_unpack_limit = git_config_int(var, value);`
			`return 0;`
			`}`

			`if (strcmp(var, "repack.usedeltabaseoffset") == 0) {`
			`prefer_ofs_delta = git_config_bool(var, value);`
			`return 0;`
			`}`

			`if (!strcmp(var, "fetch.fsckobjects")) {`
			`fetch_fsck_objects = git_config_bool(var, value);`
			`return 0;`
			`}`

			`if (!strcmp(var, "transfer.fsckobjects")) {`
			`transfer_fsck_objects = git_config_bool(var, value);`
			`return 0;`
			`}`

			`return git_default_config(var, value, cb);`
			`}`

			`static void fetch_pack_setup(void)`
			`{`
			`static int did_setup;`
			`if (did_setup)`
			`return;`
			`git_config(fetch_pack_config, NULL);`
			`if (0 <= transfer_unpack_limit)`
			`unpack_limit = transfer_unpack_limit;`
			`else if (0 <= fetch_unpack_limit)`
			`unpack_limit = fetch_unpack_limit;`
			`did_setup = 1;`
			`}`

fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`static int remove_duplicates_in_refs(struct ref **ref, int nr)`
			`{`
			`struct string_list names = STRING_LIST_INIT_NODUP;`
			`int src, dst;`

			`for (src = dst = 0; src < nr; src++) {`
			`struct string_list_item *item;`
			`item = string_list_insert(&names, ref[src]->name);`
			`if (item->util)`
			`continue; /* already have it */`
			`item->util = ref[src];`
			`if (src != dst)`
			`ref[dst] = ref[src];`
			`dst++;`
			`}`
			`for (src = dst; src < nr; src++)`
			`ref[src] = NULL;`
			`string_list_clear(&names, 0);`
			`return dst;`
			`}`

fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`struct ref fetch_pack(struct fetch_pack_args args,`
			`int fd[], struct child_process *conn,`
			`const struct ref *ref,`
			`const char *dest,`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`struct ref **sought, int nr_sought,`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`char **pack_lockfile)`
			`{`
			`struct ref *ref_cpy;`

			`fetch_pack_setup();`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`if (nr_sought)`
			`nr_sought = remove_duplicates_in_refs(sought, nr_sought);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
			`if (!ref) {`
			`packet_flush(fd[1]);`
			`die("no matching remote head");`
			`}`
fetch: use struct ref to represent refs to be fetched Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-29 23:02:15 +01:00			`ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00
fetch-pack: do not remove .git/shallow file when --depth is not specified fetch_pack() can remove .git/shallow file when a shallow repository becomes a full one again. This behavior is triggered incorrectly when tags are also fetched because fetch_pack() will be called twice. At the first fetch_pack() call: - shallow_lock is set up - alternate_shallow_file points to shallow_lock.filename, which is "shallow.lock" - commit_lock_file is called, which sets shallow_lock.filename to "". alternate_shallow_file also becomes "" because it points to the same memory. At the second call, setup_alternate_shallow() is not called and alternate_shallow_file remains "". It's mistaken as unshallow case and .git/shallow is removed. The end result is a broken repository. Fix this by always initializing alternate_shallow_file when fetch_pack() is called. As an extra measure, check if args->depth > 0 before commit/rollback shallow file. Reported-by: Kacper Kornet <kornet@camk.edu.pl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-26 04:17:26 +02:00			`if (args->depth > 0 && alternate_shallow_file) {`
fetch-pack: prepare updated shallow file before fetching the pack index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-26 03:16:15 +02:00			`if (alternate_shallow_file == '\0') { / --unshallow */`
			`unlink_or_warn(git_path("shallow"));`
			`rollback_lock_file(&shallow_lock);`
			`} else`
			`commit_lock_file(&shallow_lock);`
fetch-pack: move core code to libgit.a fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> 2012-10-26 17:53:55 +02:00			`}`

			`reprepare_packed_git();`
			`return ref_cpy;`
			`}`