mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 06:54:55 +01:00

212 lines

5.3 KiB

C

Raw Normal View History

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`#include "cache.h"`
			`#include "tag.h"`
			`#include "commit.h"`
			`#include "tree.h"`
			`#include "blob.h"`
			`#include "diff.h"`
			`#include "tree-walk.h"`
			`#include "revision.h"`
			`#include "list-objects.h"`

			`static void process_blob(struct rev_info *revs,`
			`struct blob *blob,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`struct name_path *path,`
			`const char *name)`
			`{`
			`struct object *obj = &blob->object;`

			`if (!revs->blob_objects)`
			`return;`
list-objects.c::process_tree/blob: check for NULL As these functions are directly called with the result from lookup_tree/blob, they must handle NULL. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-18 21:47:56 +01:00			`if (!obj)`
			`die("bad blob object");`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`return;`
			`obj->flags \|= SEEN;`
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 03:15:26 +02:00			`show(obj, path, name);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`

Teach git list-objects logic to not follow gitlinks This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:25:01 +02:00			`/*`
			`* Processing a gitlink entry currently does nothing, since`
			`* we do not recurse into the subproject.`
			`*`
			`* We could eventually add a flag that actually does that,`
			`* which would involve:`
			`* - is the subproject actually checked out?`
			`* - if so, see if the subproject has already been added`
			`* to the alternates list, and add it if not.`
			`* - process the commit (or tag) the gitlink points to`
			`* recursively.`
			`*`
			`* However, it's unclear whether there is really ever any`
			`* reason to see superprojects and subprojects as such a`
			`* "unified" object pool (potentially resulting in a totally`
			`* humongous pack - avoiding which was the whole point of`
			`* having gitlinks in the first place!).`
			`*`
			`* So for now, there is just a note that we could follow`
			`* the link, and how to do it. Whether it necessarily makes`
			`* any sense what-so-ever to ever do that is another issue.`
			`*/`
			`static void process_gitlink(struct rev_info *revs,`
			`const unsigned char *sha1,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
Teach git list-objects logic to not follow gitlinks This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:25:01 +02:00			`struct name_path *path,`
			`const char *name)`
			`{`
			`/* Nothing to do */`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`static void process_tree(struct rev_info *revs,`
			`struct tree *tree,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show_object_fn show,`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`struct name_path *path,`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`struct strbuf *base,`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`const char *name)`
			`{`
			`struct object *obj = &tree->object;`
			`struct tree_desc desc;`
			`struct name_entry entry;`
			`struct name_path me;`
Improve tree_entry_interesting() handling code t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting / break; if (ret == 0) / skip this round / continue; } / ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-25 10:34:20 +01:00			`int match = revs->diffopt.pathspec.nr == 0 ? 2 : 0;`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`int baselen = base->len;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
			`if (!revs->tree_objects)`
			`return;`
list-objects.c::process_tree/blob: check for NULL As these functions are directly called with the result from lookup_tree/blob, they must handle NULL. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-18 21:47:56 +01:00			`if (!obj)`
			`die("bad tree object");`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`return;`
			`if (parse_tree(tree) < 0)`
			`die("bad tree object %s", sha1_to_hex(obj->sha1));`
			`obj->flags \|= SEEN;`
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 03:15:26 +02:00			`show(obj, path, name);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`me.up = path;`
			`me.elem = name;`
			`me.elem_len = strlen(name);`

Improve tree_entry_interesting() handling code t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting / break; if (ret == 0) / skip this round / continue; } / ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-25 10:34:20 +01:00			`if (!match) {`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_addstr(base, name);`
			`if (base->len)`
			`strbuf_addch(base, '/');`
			`}`

Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&desc, tree->buffer, tree->size);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
			`while (tree_entry(&desc, &entry)) {`
Improve tree_entry_interesting() handling code t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting / break; if (ret == 0) / skip this round / continue; } / ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-25 10:34:20 +01:00			`if (match != 2) {`
			`match = tree_entry_interesting(&entry, base, 0,`
			`&revs->diffopt.pathspec);`
			`if (match < 0)`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`break;`
Improve tree_entry_interesting() handling code t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting / break; if (ret == 0) / skip this round / continue; } / ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-25 10:34:20 +01:00			`if (match == 0)`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`continue;`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`if (S_ISDIR(entry.mode))`
			`process_tree(revs,`
			`lookup_tree(entry.sha1),`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`show, &me, base, entry.path);`
rename dirlink to gitlink. Unify naming of plumbing dirlink/gitlink concept: git ls-files -z '*.[ch]' \| xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;' Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-21 22:08:28 +02:00			`else if (S_ISGITLINK(entry.mode))`
Teach git list-objects logic to not follow gitlinks This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:25:01 +02:00			`process_gitlink(revs, entry.sha1,`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show, &me, entry.path);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`else`
			`process_blob(revs,`
			`lookup_blob(entry.sha1),`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`show, &me, entry.path);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_setlen(base, baselen);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`free(tree->buffer);`
			`tree->buffer = NULL;`
			`}`

pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-06 10:42:23 +02:00			`static void mark_edge_parents_uninteresting(struct commit *commit,`
			`struct rev_info *revs,`
			`show_edge_fn show_edge)`
			`{`
			`struct commit_list *parents;`

			`for (parents = commit->parents; parents; parents = parents->next) {`
			`struct commit *parent = parents->item;`
			`if (!(parent->object.flags & UNINTERESTING))`
			`continue;`
			`mark_tree_uninteresting(parent->tree);`
			`if (revs->edge_hint && !(parent->object.flags & SHOWN)) {`
			`parent->object.flags \|= SHOWN;`
			`show_edge(parent);`
			`}`
			`}`
			`}`

			`void mark_edges_uninteresting(struct commit_list *list,`
			`struct rev_info *revs,`
			`show_edge_fn show_edge)`
			`{`
			`for ( ; list; list = list->next) {`
			`struct commit *commit = list->item;`

			`if (commit->object.flags & UNINTERESTING) {`
			`mark_tree_uninteresting(commit->tree);`
			`continue;`
			`}`
			`mark_edge_parents_uninteresting(commit, revs, show_edge);`
			`}`
			`}`

process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`static void add_pending_tree(struct rev_info revs, struct tree tree)`
			`{`
			`add_pending_object(revs, &tree->object, "");`
			`}`

Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`void traverse_commit_list(struct rev_info *revs,`
list-objects: add "void *data" parameter to show functions The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-06 21:28:36 +02:00			`show_commit_fn show_commit,`
			`show_object_fn show_object,`
			`void *data)`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`{`
			`int i;`
			`struct commit *commit;`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`struct strbuf base;`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_init(&base, PATH_MAX);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`while ((commit = get_revision(revs)) != NULL) {`
list-objects.c: don't add an unparsed NULL as a pending tree "git rev-list --first-parent --boundary $commit^..$commit" segfaults on a merge commit since 8d2dfc4 (process_{tree,blob}: show objects without buffering, 2009-04-10), as it tried to dereference a commit that was discarded as UNINTERESTING without being parsed (hence lacking "tree"). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-14 20:29:50 +01:00			`/*`
			`* an uninteresting boundary commit may not have its tree`
			`* parsed yet, but we are not going to show them anyway`
			`*/`
			`if (commit->tree)`
			`add_pending_tree(revs, commit->tree);`
list-objects: add "void *data" parameter to show functions The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-06 21:28:36 +02:00			`show_commit(commit, data);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`
			`for (i = 0; i < revs->pending.nr; i++) {`
			`struct object_array_entry *pending = revs->pending.objects + i;`
			`struct object *obj = pending->item;`
			`const char *name = pending->name;`
			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`continue;`
			`if (obj->type == OBJ_TAG) {`
			`obj->flags \|= SEEN;`
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 03:15:26 +02:00			`show_object(obj, NULL, name);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`continue;`
			`}`
			`if (obj->type == OBJ_TREE) {`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`process_tree(revs, (struct tree *)obj, show_object,`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`NULL, &base, name);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`continue;`
			`}`
			`if (obj->type == OBJ_BLOB) {`
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-11 02:27:58 +02:00			`process_blob(revs, (struct blob *)obj, show_object,`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`NULL, name);`
			`continue;`
			`}`
			`die("unknown pending object %s (%s)",`
			`sha1_to_hex(obj->sha1), name);`
			`}`
Fix memory leak in traverse_commit_list If we were listing objects too then the objects were buffered in an array only reachable from a stack allocated structure. When this function returns that array would be leaked as nobody would have a reference to it anymore. Historically this hasn't been a problem as the primary user of traverse_commit_list() (the noble git-rev-list) would terminate as soon as the function was finished, thus allowing the operating system to cleanup memory. However we have been leaking this data in git-pack-objects ever since that program learned how to run the revision listing internally, rather than relying on reading object names from git-rev-list. To better facilitate reuse of traverse_commit_list during other builtin tools (such as git-fetch) we shouldn't leak temporary memory like this and instead we need to clean up properly after ourselves. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-09 12:06:10 +01:00			`if (revs->pending.nr) {`
			`free(revs->pending.objects);`
			`revs->pending.nr = 0;`
			`revs->pending.alloc = 0;`
			`revs->pending.objects = NULL;`
			`}`
Make rev-list --objects work together with pathspecs When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 14:26:47 +01:00			`strbuf_release(&base);`
Separate object listing routines out of rev-list Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-05 06:50:12 +02:00			`}`