mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00

235 lines

6 KiB

C

Raw Normal View History

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`#include "cache.h"`
			`#include "tag.h"`
			`#include "commit.h"`
			`#include "tree.h"`
			`#include "blob.h"`
			`#include "diff.h"`
			`#include "tree-walk.h"`
			`#include "revision.h"`
			`#include "list-objects.h"`

			`static void process_blob(struct rev_info *revs,`
			`struct blob *blob,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
list-objects: drop name_path entirely In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:44 +01:00			`struct strbuf *path,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`const char *name,`
			`void *cb_data)`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`{`
			`struct object *obj = &blob->object;`
list-objects: pass full pathname to callbacks When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:28:36 +01:00			`size_t pathlen;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
			`if (!revs->blob_objects)`
			`return;`
list-objects.c::process_tree/blob: check for NULL As these functions are directly called with the result from lookup_tree/blob, they must handle NULL. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-18 21:47:56 +01:00			`if (!obj)`
			`die("bad blob object");`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`return;`
			`obj->flags \|= SEEN;`
list-objects: pass full pathname to callbacks When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:28:36 +01:00
			`pathlen = path->len;`
			`strbuf_addstr(path, name);`
			`show(obj, path->buf, cb_data);`
			`strbuf_setlen(path, pathlen);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`

Teach git list-objects logic to not follow gitlinks This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:25:01 +02:00			`/*`
			`* Processing a gitlink entry currently does nothing, since`
			`* we do not recurse into the subproject.`
			`*`
			`* We could eventually add a flag that actually does that,`
			`* which would involve:`
			`* - is the subproject actually checked out?`
			`* - if so, see if the subproject has already been added`
			`* to the alternates list, and add it if not.`
			`* - process the commit (or tag) the gitlink points to`
			`* recursively.`
			`*`
			`* However, it's unclear whether there is really ever any`
			`* reason to see superprojects and subprojects as such a`
			`* "unified" object pool (potentially resulting in a totally`
			`* humongous pack - avoiding which was the whole point of`
			`* having gitlinks in the first place!).`
			`*`
			`* So for now, there is just a note that we could follow`
			`* the link, and how to do it. Whether it necessarily makes`
			`* any sense what-so-ever to ever do that is another issue.`
			`*/`
			`static void process_gitlink(struct rev_info *revs,`
			`const unsigned char *sha1,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
list-objects: drop name_path entirely In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:44 +01:00			`struct strbuf *path,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`const char *name,`
			`void *cb_data)`
Teach git list-objects logic to not follow gitlinks This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:25:01 +02:00			`{`
			`/* Nothing to do */`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`static void process_tree(struct rev_info *revs,`
			`struct tree *tree,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`struct strbuf *base,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`const char *name,`
			`void *cb_data)`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`{`
			`struct object *obj = &tree->object;`
			`struct tree_desc desc;`
			`struct name_entry entry;`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`enum interesting match = revs->diffopt.pathspec.nr == 0 ?`
			`all_entries_interesting: entry_not_interesting;`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`int baselen = base->len;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
			`if (!revs->tree_objects)`
			`return;`
list-objects.c::process_tree/blob: check for NULL As these functions are directly called with the result from lookup_tree/blob, they must handle NULL. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-18 21:47:56 +01:00			`if (!obj)`
			`die("bad tree object");`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`return;`
silence broken link warnings with revs->ignore_missing_links We set revs->ignore_missing_links to instruct the revision-walking machinery that we know the history graph may be incomplete. For example, we use it when walking unreachable but recent objects; we want to add what we can, but it's OK if the history is incomplete. However, we still print error messages for the missing objects, which can be confusing. This is not an error, but just a normal situation when transitioning from a repository last pruned by an older git (which can leave broken segments of history) to a more recent one (where we try to preserve whole reachable segments). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-06-01 11:56:37 +02:00			`if (parse_tree_gently(tree, revs->ignore_missing_links) < 0) {`
add `ignore_missing_links` mode to revwalk When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-28 11:00:43 +01:00			`if (revs->ignore_missing_links)`
			`return;`
Convert struct object to object_id struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net> 2015-11-10 03:22:28 +01:00			`die("bad tree object %s", oid_to_hex(&obj->oid));`
add `ignore_missing_links` mode to revwalk When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-28 11:00:43 +01:00			`}`
list-objects: convert name_path to a strbuf The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:18 +01:00
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`obj->flags \|= SEEN;`
list-objects: convert name_path to a strbuf The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:18 +01:00			`strbuf_addstr(base, name);`
list-objects: pass full pathname to callbacks When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:28:36 +01:00			`show(obj, base->buf, cb_data);`
list-objects: convert name_path to a strbuf The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:18 +01:00			`if (base->len)`
			`strbuf_addch(base, '/');`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&desc, tree->buffer, tree->size);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
			`while (tree_entry(&desc, &entry)) {`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`if (match != all_entries_interesting) {`
Improve tree_entry_interesting() handling code t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting / break; if (ret == 0) / skip this round / continue; } / ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-25 10:34:20 +01:00			`match = tree_entry_interesting(&entry, base, 0,`
			`&revs->diffopt.pathspec);`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`if (match == all_entries_not_interesting)`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`break;`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`if (match == entry_not_interesting)`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`continue;`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (S_ISDIR(entry.mode))`
			`process_tree(revs,`
struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-18 01:10:39 +02:00			`lookup_tree(entry.oid->hash),`
list-objects: convert name_path to a strbuf The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:18 +01:00			`show, base, entry.path,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`cb_data);`
rename dirlink to gitlink. Unify naming of plumbing dirlink/gitlink concept: git ls-files -z '*.[ch]' \| xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;' Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-21 22:08:28 +02:00			`else if (S_ISGITLINK(entry.mode))`
struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-18 01:10:39 +02:00			`process_gitlink(revs, entry.oid->hash,`
list-objects: drop name_path entirely In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:44 +01:00			`show, base, entry.path,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`cb_data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`else`
			`process_blob(revs,`
struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-18 01:10:39 +02:00			`lookup_blob(entry.oid->hash),`
list-objects: drop name_path entirely In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:44 +01:00			`show, base, entry.path,`
list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-02 00:43:33 +02:00			`cb_data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_setlen(base, baselen);`
clear parsed flag when we free tree buffers Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-06 00:37:39 +02:00			`free_tree_buffer(tree);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`static void mark_edge_parents_uninteresting(struct commit *commit,`
			`struct rev_info *revs,`
			`show_edge_fn show_edge)`
			`{`
			`struct commit_list *parents;`

			`for (parents = commit->parents; parents; parents = parents->next) {`
			`struct commit *parent = parents->item;`
			`if (!(parent->object.flags & UNINTERESTING))`
			`continue;`
			`mark_tree_uninteresting(parent->tree);`
			`if (revs->edge_hint && !(parent->object.flags & SHOWN)) {`
			`parent->object.flags \|= SHOWN;`
			`show_edge(parent);`
			`}`
			`}`
			`}`

list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:06 +02:00			`void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`{`
list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:06 +02:00			`struct commit_list *list;`
list-objects: mark more commits as edges in mark_edges_uninteresting The purpose of edge commits is to let pack-objects know what objects it can use as base, but does not need to include in the thin pack because the other side is supposed to already have them. So far we mark uninteresting parents of interesting commits as edges. But even an unrelated uninteresting commit (that the other side has) may become a good base for pack-objects and help produce more efficient packs. This is especially true for shallow clone, when the client issues a fetch with a depth smaller or equal to the number of commits the server is ahead of the client. For example, in this commit history the client has up to "A" and the server has up to "B": -------A---B have--^ ^ / want--+ If depth 1 is requested, the commit list to send to the client includes only B. The way m_e_u is working, it checks if parent commits of B are uninteresting, if so mark them as edges. Due to shallow effect, commit B is grafted to have no parents and the revision walker never sees A as the parent of B. In fact it marks no edges at all in this simple case and sends everything B has to the client even if it could have excluded what A and also the client already have. In a slightly different case where A is not a direct parent of B (iow there are commits in between A and B), marking A as an edge can still save some because B may still have stuff from the far ancestor A. There is another case from the earlier patch, when we deepen a ref from C->E to A->E: ---A---B C---D---E want--^ ^ ^ shallow-+ / have-------+ In this case we need to send A and B to the client, and C (i.e. the current shallow point that the client informs the server) is a very good base because it's closet to A and B. Normal m_e_u won't recognize C as an edge because it only looks back to parents (i.e. A<-B) not the opposite way B->C even if C is already marked as uninteresting commit by the previous patch. This patch includes all uninteresting commits from command line as edges and lets pack-objects decide what's best to do. The upside is we have better chance of producing better packs in certain cases. The downside is we may need to process some extra objects on the server side. For the shallow case on git.git, when the client is 5 commits behind and does "fetch --depth=3", the result pack is 99.26 KiB instead of 4.92 MiB. Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:07 +02:00			`int i;`

list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:06 +02:00			`for (list = revs->commits; list; list = list->next) {`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`struct commit *commit = list->item;`

			`if (commit->object.flags & UNINTERESTING) {`
			`mark_tree_uninteresting(commit->tree);`
rev-list: add an option to mark fewer edges as uninteresting In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-12-25 00:05:39 +01:00			`if (revs->edge_hint_aggressive && !(commit->object.flags & SHOWN)) {`
list-objects: mark more commits as edges in mark_edges_uninteresting The purpose of edge commits is to let pack-objects know what objects it can use as base, but does not need to include in the thin pack because the other side is supposed to already have them. So far we mark uninteresting parents of interesting commits as edges. But even an unrelated uninteresting commit (that the other side has) may become a good base for pack-objects and help produce more efficient packs. This is especially true for shallow clone, when the client issues a fetch with a depth smaller or equal to the number of commits the server is ahead of the client. For example, in this commit history the client has up to "A" and the server has up to "B": -------A---B have--^ ^ / want--+ If depth 1 is requested, the commit list to send to the client includes only B. The way m_e_u is working, it checks if parent commits of B are uninteresting, if so mark them as edges. Due to shallow effect, commit B is grafted to have no parents and the revision walker never sees A as the parent of B. In fact it marks no edges at all in this simple case and sends everything B has to the client even if it could have excluded what A and also the client already have. In a slightly different case where A is not a direct parent of B (iow there are commits in between A and B), marking A as an edge can still save some because B may still have stuff from the far ancestor A. There is another case from the earlier patch, when we deepen a ref from C->E to A->E: ---A---B C---D---E want--^ ^ ^ shallow-+ / have-------+ In this case we need to send A and B to the client, and C (i.e. the current shallow point that the client informs the server) is a very good base because it's closet to A and B. Normal m_e_u won't recognize C as an edge because it only looks back to parents (i.e. A<-B) not the opposite way B->C even if C is already marked as uninteresting commit by the previous patch. This patch includes all uninteresting commits from command line as edges and lets pack-objects decide what's best to do. The upside is we have better chance of producing better packs in certain cases. The downside is we may need to process some extra objects on the server side. For the shallow case on git.git, when the client is 5 commits behind and does "fetch --depth=3", the result pack is 99.26 KiB instead of 4.92 MiB. Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:07 +02:00			`commit->object.flags \|= SHOWN;`
			`show_edge(commit);`
			`}`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`continue;`
			`}`
			`mark_edge_parents_uninteresting(commit, revs, show_edge);`
			`}`
rev-list: add an option to mark fewer edges as uninteresting In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-12-25 00:05:39 +01:00			`if (revs->edge_hint_aggressive) {`
list-objects: only look at cmdline trees with edge_hint When rev-list is given a command-line like: git rev-list --objects $commit --not --all the most accurate answer is the difference between the set of objects reachable from $commit and the set reachable from all of the existing refs. However, we have not historically provided that answer, because it is very expensive to calculate. We would have to open every tree of every commit in the entire history. Instead, we find the accurate set difference of the reachable commits, and then mark the trees at the boundaries as uninteresting. This misses objects which appear in the trees of both the interesting commits and deep within the uninteresting history. Commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting, 2013-08-16) noticed that we miss those objects during pack-objects, and added code to examine the trees of all of the "--not" refs given on the command-line. Note that this is still not the complete set difference, because we look only at the tips of the command-line arguments, not all of their reachable commits. But it increases the set of boundary objects we consider, which is especially important for shallow fetches. So we are trading extra CPU time for a larger set of boundary objects, which can improve the resulting pack size for a --thin pack. This tradeoff probably makes sense in the context of pack-objects, where we have set revs->edge_hint to have the traversal feed us the set of boundary objects. For a regular rev-list, though, it is probably not a good tradeoff. It is true that it makes our list slightly closer to a true set difference, but it is a rare case where this is important. And because we do not have revs->edge_hint set, we do nothing useful with the larger set of boundary objects. This patch therefore ties the extra tree examination to the revs->edge_hint flag; it is the presence of that flag that makes the tradeoff worthwhile. Here is output from the p0001-rev-list showing the improvement in performance: Test HEAD^ HEAD ----------------------------------------------------------------------------------------- 0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0% 0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3% 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0% 0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-01-21 03:25:40 +01:00			`for (i = 0; i < revs->cmdline.nr; i++) {`
			`struct object *obj = revs->cmdline.rev[i].item;`
			`struct commit commit = (struct commit )obj;`
			`if (obj->type != OBJ_COMMIT \|\| !(obj->flags & UNINTERESTING))`
			`continue;`
			`mark_tree_uninteresting(commit->tree);`
			`if (!(obj->flags & SHOWN)) {`
			`obj->flags \|= SHOWN;`
			`show_edge(commit);`
			`}`
list-objects: mark more commits as edges in mark_edges_uninteresting The purpose of edge commits is to let pack-objects know what objects it can use as base, but does not need to include in the thin pack because the other side is supposed to already have them. So far we mark uninteresting parents of interesting commits as edges. But even an unrelated uninteresting commit (that the other side has) may become a good base for pack-objects and help produce more efficient packs. This is especially true for shallow clone, when the client issues a fetch with a depth smaller or equal to the number of commits the server is ahead of the client. For example, in this commit history the client has up to "A" and the server has up to "B": -------A---B have--^ ^ / want--+ If depth 1 is requested, the commit list to send to the client includes only B. The way m_e_u is working, it checks if parent commits of B are uninteresting, if so mark them as edges. Due to shallow effect, commit B is grafted to have no parents and the revision walker never sees A as the parent of B. In fact it marks no edges at all in this simple case and sends everything B has to the client even if it could have excluded what A and also the client already have. In a slightly different case where A is not a direct parent of B (iow there are commits in between A and B), marking A as an edge can still save some because B may still have stuff from the far ancestor A. There is another case from the earlier patch, when we deepen a ref from C->E to A->E: ---A---B C---D---E want--^ ^ ^ shallow-+ / have-------+ In this case we need to send A and B to the client, and C (i.e. the current shallow point that the client informs the server) is a very good base because it's closet to A and B. Normal m_e_u won't recognize C as an edge because it only looks back to parents (i.e. A<-B) not the opposite way B->C even if C is already marked as uninteresting commit by the previous patch. This patch includes all uninteresting commits from command line as edges and lets pack-objects decide what's best to do. The upside is we have better chance of producing better packs in certain cases. The downside is we may need to process some extra objects on the server side. For the shallow case on git.git, when the client is 5 commits behind and does "fetch --depth=3", the result pack is 99.26 KiB instead of 4.92 MiB. Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-16 11:52:07 +02:00			`}`
			`}`
pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`}`

process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`static void add_pending_tree(struct rev_info revs, struct tree tree)`
			`{`
			`add_pending_object(revs, &tree->object, "");`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`void traverse_commit_list(struct rev_info *revs,`
list-objects: add "void *data" parameter to show functions The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-06 21:28:36 +02:00			`show_commit_fn show_commit,`
			`show_object_fn show_object,`
			`void *data)`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`{`
			`int i;`
			`struct commit *commit;`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`struct strbuf base;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_init(&base, PATH_MAX);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`while ((commit = get_revision(revs)) != NULL) {`
list-objects.c: don't add an unparsed NULL as a pending tree "git rev-list --first-parent --boundary $commit^..$commit" segfaults on a merge commit since 8d2dfc4 (process_{tree,blob}: show objects without buffering, 2009-04-10), as it tried to dereference a commit that was discarded as UNINTERESTING without being parsed (hence lacking "tree"). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-14 20:29:50 +01:00			`/*`
			`* an uninteresting boundary commit may not have its tree`
			`* parsed yet, but we are not going to show them anyway`
			`*/`
			`if (commit->tree)`
			`add_pending_tree(revs, commit->tree);`
list-objects: add "void *data" parameter to show functions The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-06 21:28:36 +02:00			`show_commit(commit, data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`
			`for (i = 0; i < revs->pending.nr; i++) {`
			`struct object_array_entry *pending = revs->pending.objects + i;`
			`struct object *obj = pending->item;`
			`const char *name = pending->name;`
traverse_commit_list: support pending blobs/trees with paths When we call traverse_commit_list, we may have trees and blobs in the pending array. As we process these, we pass the "name" field from the pending entry as the path of the object within the tree (which then becomes the root path if we recurse into subtrees). When we set up the traversal in prepare_revision_walk, though, the "name" field of any pending trees and blobs is likely to be the ref at which we found the object. We would not want to make this part of the path (e.g., doing so would make "git rev-list --objects v2.6.11-tree" in linux.git show paths like "v2.6.11-tree/Makefile", which is nonsensical). Therefore prepare_revision_walk sets the name field of each pending tree and blobs to the empty string. However, this leaves no room for a caller who does know the correct path of a pending object to propagate that information to the revision walker. We can fix this by making two related changes: 1. Use the "path" field as the path instead of the "name" field in traverse_commit_list. If the path is not set, default to "" (which is what we always ended up with in the current code, because of prepare_revision_walk). 2. In prepare_revision_walk, make a complete copy of the entry. This makes the path field available to the walker (if there is one), solving our problem. Leaving the name field intact is now OK, as we do not use it as a path due to point (1) above (and we can use it to make more meaningful error messages if we want). We also make the original "mode" field available to the walker, though it does not actually use it. Note that we still re-add the pending objects and free the old ones (so we may strdup the path and name only to free the old ones). This could be made more efficient by simply copying the object_array entries that we are keeping. However, that would require more restructuring of the code, and is not done here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-16 00:43:19 +02:00			`const char *path = pending->path;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`continue;`
			`if (obj->type == OBJ_TAG) {`
			`obj->flags \|= SEEN;`
list-objects: pass full pathname to callbacks When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:28:36 +01:00			`show_object(obj, name, data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`continue;`
			`}`
traverse_commit_list: support pending blobs/trees with paths When we call traverse_commit_list, we may have trees and blobs in the pending array. As we process these, we pass the "name" field from the pending entry as the path of the object within the tree (which then becomes the root path if we recurse into subtrees). When we set up the traversal in prepare_revision_walk, though, the "name" field of any pending trees and blobs is likely to be the ref at which we found the object. We would not want to make this part of the path (e.g., doing so would make "git rev-list --objects v2.6.11-tree" in linux.git show paths like "v2.6.11-tree/Makefile", which is nonsensical). Therefore prepare_revision_walk sets the name field of each pending tree and blobs to the empty string. However, this leaves no room for a caller who does know the correct path of a pending object to propagate that information to the revision walker. We can fix this by making two related changes: 1. Use the "path" field as the path instead of the "name" field in traverse_commit_list. If the path is not set, default to "" (which is what we always ended up with in the current code, because of prepare_revision_walk). 2. In prepare_revision_walk, make a complete copy of the entry. This makes the path field available to the walker (if there is one), solving our problem. Leaving the name field intact is now OK, as we do not use it as a path due to point (1) above (and we can use it to make more meaningful error messages if we want). We also make the original "mode" field available to the walker, though it does not actually use it. Note that we still re-add the pending objects and free the old ones (so we may strdup the path and name only to free the old ones). This could be made more efficient by simply copying the object_array entries that we are keeping. However, that would require more restructuring of the code, and is not done here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-16 00:43:19 +02:00			`if (!path)`
			`path = "";`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->type == OBJ_TREE) {`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`process_tree(revs, (struct tree *)obj, show_object,`
list-objects: convert name_path to a strbuf The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:26:18 +01:00			`&base, path, data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`continue;`
			`}`
			`if (obj->type == OBJ_BLOB) {`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`process_blob(revs, (struct blob *)obj, show_object,`
list-objects: pass full pathname to callbacks When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-11 23:28:36 +01:00			`&base, path, data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`continue;`
			`}`
			`die("unknown pending object %s (%s)",`
Convert struct object to object_id struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net> 2015-11-10 03:22:28 +01:00			`oid_to_hex(&obj->oid), name);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`
object_array: add a "clear" function There's currently no easy way to free the memory associated with an object_array (and in most cases, we simply leak the memory in a rev_info's pending array). Let's provide a helper to make this easier to handle. We can make use of it in list-objects.c, which does the same thing by hand (but fails to free the "name" field of each entry, potentially leaking memory). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-16 00:34:34 +02:00			`object_array_clear(&revs->pending);`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_release(&base);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`