mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00

373 lines

8.3 KiB

C

Raw Normal View History

Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00			`#include "cache.h"`
upload-pack: Do not choke on too many heads request. Cloning from a repository with more than 256 refs (heads and tags included) will choke, because upload-pack has a built-in limit of feeding not more than MAX_NEEDS (currently 256) heads to underlying git-rev-list. This is a problem when cloning a repository with many tags, like http://www.linux-mips.org/pub/scm/linux.git, which has 290+ tags. This commit introduces a new flag, --all, to git-rev-list, to include all refs in the repository. Updated upload-pack detects requests that ask more than MAX_NEEDS refs, and sends everything back instead. We may probably want to tweak the definitions of MAX_NEEDS and MAX_HAS, but that is a separate topic. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-05 23:49:54 +02:00			`#include "refs.h"`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`#include "tag.h"`
Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00			`#include "commit.h"`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`#include "tree.h"`
			`#include "blob.h"`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`#include "tree-walk.h"`
blame and friends: adjust to multiple pathspec change. This makes things that include revision.h build again. Blame is also built, but I am not sure how well it works (or how well it worked to begin with) -- it was relying on tree-diff to be using whatever pathspec was used the last time, which smells a bit suspicious. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 03:14:54 +02:00			`#include "diff.h"`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`#include "revision.h"`
Make "git rev-list" be a builtin This was surprisingly easy. The diff is truly minimal: rename "main()" to "cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its new built-in status. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 23:19:20 +02:00			`#include "builtin.h"`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00
rev-list --boundary: show boundary commits even when limited otherwise. The boundary commits are shown for UI like gitk to draw them as soon as topo-order sorting allows, and should not be omitted by get_revision() filtering logic. As long as their immediate child commits are shown, we should not filter them out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 03:12:49 +02:00			`/* bits #0-15 in revision.h */`
Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00
rev-list --boundary: show boundary commits even when limited otherwise. The boundary commits are shown for UI like gitk to draw them as soon as topo-order sorting allows, and should not be omitted by get_revision() filtering logic. As long as their immediate child commits are shown, we should not filter them out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 03:12:49 +02:00			`#define COUNTED (1u<<16)`
git-rev-list: use proper lazy reachability analysis This mean sthat you can give a beginning/end pair to git-rev-list, and it will show all entries that are reachable from the beginning but not the end. For example git-rev-list v2.6.12-rc5 v2.6.12-rc4 shows all commits that are in -rc5 but are not in -rc4. 2005-05-31 03:46:32 +02:00
git-rev-list: add "end" commit and "--header" flag The "end" commit is just faking it right now, it's sorting things purely by date, so this is _not_ a reachability analysis. Some day. The "--header" flag causes the commit message to be printed out, with a NUL character separator after it for parseability. This allows you to do things like use "grep -z" to grep for certain authors etc. 2005-05-26 03:29:09 +02:00			`static const char rev_list_usage[] =`
Update usage string and documentation for git-rev-list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-30 10:03:45 +01:00			`"git-rev-list [OPTION] <commit-id>... [ -- paths... ]\n"`
			`" limiting output:\n"`
			`" --max-count=nr\n"`
			`" --max-age=epoch\n"`
			`" --min-age=epoch\n"`
			`" --sparse\n"`
			`" --no-merges\n"`
rev-list --remove-empty: add minimum help and doc entry. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-27 10:39:24 +01:00			`" --remove-empty\n"`
Update usage string and documentation for git-rev-list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-30 10:03:45 +01:00			`" --all\n"`
			`" ordering output:\n"`
			`" --topo-order\n"`
topo-order: make --date-order optional. This adds --date-order to rev-list; it is similar to topo order in the sense that no parent comes before all of its children, but otherwise things are still ordered in the commit timestamp order. The same flag is also added to show-branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 07:05:33 +01:00			`" --date-order\n"`
Update usage string and documentation for git-rev-list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-30 10:03:45 +01:00			`" formatting output:\n"`
			`" --parents\n"`
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`" --objects \| --objects-edge\n"`
Update usage string and documentation for git-rev-list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-30 10:03:45 +01:00			`" --unpacked\n"`
			`" --header \| --pretty\n"`
rev-list: default to abbreviate merge parent names under --pretty. When we prettyprint commit log messages, merge parent names were often very long and there was no way to abbreviate it. This changes them to be abbreviated by default, and non-default abbreviations can be specified with --no-abbrev or --abbrev=<n> options. Note that this affects only the prettyprinted parent names. The output from --show-parents is meant for machine consumption and is not affected by this flag. 2006-02-10 20:56:42 +01:00			`" --abbrev=nr \| --no-abbrev\n"`
rev-list --abbrev-commit This should make --pretty=oneline a whole lot more readable for people using 80-column terminals. Originally from Eric Wong. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-07 06:32:36 +02:00			`" --abbrev-commit\n"`
Update usage string and documentation for git-rev-list. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-30 10:03:45 +01:00			`" special purpose:\n"`
			`" --bisect"`
			`;`
git-rev-list: add "end" commit and "--header" flag The "end" commit is just faking it right now, it's sorting things purely by date, so this is _not_ a reachability analysis. Some day. The "--header" flag causes the commit message to be printed out, with a NUL character separator after it for parseability. This allows you to do things like use "grep -z" to grep for certain authors etc. 2005-05-26 03:29:09 +02:00
Make "git rev-list" be a builtin This was surprisingly easy. The diff is truly minimal: rename "main()" to "cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its new built-in status. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 23:19:20 +02:00			`static struct rev_info revs;`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`static int bisect_list = 0;`
rev-list --timestamp This prefixes the raw commit timestamp to the output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-22 09:22:00 +01:00			`static int show_timestamp = 0;`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`static int hdr_termination = 0;`
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 20:59:32 +02:00			`static const char *header_prefix;`
rev-list --objects: use full pathname to help hashing. This helps to group the same files from different revs together, while spreading files with the same basename in different directories, to help pack-object. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`static void show_commit(struct commit *commit)`
			`{`
rev-list --timestamp This prefixes the raw commit timestamp to the output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-22 09:22:00 +01:00			`if (show_timestamp)`
			`printf("%lu ", commit->date);`
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 20:59:32 +02:00			`if (header_prefix)`
			`fputs(header_prefix, stdout);`
rev-list --boundary With the new --boundary flag, the output from rev-list includes the UNINTERESING commits at the boundary, which are usually not shown. Their object names are prefixed with '-'. For example, with this graph: C side / A---B---D master You would get something like this: $ git rev-list --boundary --header --parents side..master D B tree D^{tree} parent B ... log message for commit D here ... \0-B A tree B^{tree} parent A ... log message for commit B here ... \0 Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-28 09:58:34 +02:00			`if (commit->object.flags & BOUNDARY)`
			`putchar('-');`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`if (revs.abbrev_commit && revs.abbrev)`
			`fputs(find_unique_abbrev(commit->object.sha1, revs.abbrev),`
			`stdout);`
rev-list --abbrev-commit This should make --pretty=oneline a whole lot more readable for people using 80-column terminals. Originally from Eric Wong. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-07 06:32:36 +02:00			`else`
			`fputs(sha1_to_hex(commit->object.sha1), stdout);`
Move "--parent" parsing into generic revision.c library code Not only do we do it in both rev-list.c and git.c, the revision walking code will soon want to know whether we should rewrite parenthood information or not. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-31 02:52:42 +02:00			`if (revs.parents) {`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`struct commit_list *parents = commit->parents;`
			`while (parents) {`
rev-list: omit duplicated parents. Showing the same parent more than once for a commit does not make much sense downstream, so stop it. This can happen with an incorrectly made merge commit that merges the same parent twice, but can happen in an otherwise sane development history while squishing the history by taking into account only commits that touch specified paths. For example, $ git rev-list --max-count=1 --parents addafaf -- rev-list.c would have to show this commit ancestry graph: .---o---. / \ .------o---. / 93b74bc \ ------o---o-----o---o-----o addafaf d8f6b34 \ / .---o---o---. \ / .---*---. 3815f42 where 5 independent development tracks, only two of which have changes in the specified paths since they forked. The last change for the other three development tracks was done by the same commit before they forked, and we were showing that three times. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-30 00:24:42 +01:00			`struct object *o = &(parents->item->object);`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`parents = parents->next;`
rev-list: omit duplicated parents. Showing the same parent more than once for a commit does not make much sense downstream, so stop it. This can happen with an incorrectly made merge commit that merges the same parent twice, but can happen in an otherwise sane development history while squishing the history by taking into account only commits that touch specified paths. For example, $ git rev-list --max-count=1 --parents addafaf -- rev-list.c would have to show this commit ancestry graph: .---o---. / \ .------o---. / 93b74bc \ ------o---o-----o---o-----o addafaf d8f6b34 \ / .---o---o---. \ / .---*---. 3815f42 where 5 independent development tracks, only two of which have changes in the specified paths since they forked. The last change for the other three development tracks was done by the same commit before they forked, and we were showing that three times. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-30 00:24:42 +01:00			`if (o->flags & TMP_MARK)`
			`continue;`
			`printf(" %s", sha1_to_hex(o->sha1));`
			`o->flags \|= TMP_MARK;`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`}`
rev-list: omit duplicated parents. Showing the same parent more than once for a commit does not make much sense downstream, so stop it. This can happen with an incorrectly made merge commit that merges the same parent twice, but can happen in an otherwise sane development history while squishing the history by taking into account only commits that touch specified paths. For example, $ git rev-list --max-count=1 --parents addafaf -- rev-list.c would have to show this commit ancestry graph: .---o---. / \ .------o---. / 93b74bc \ ------o---o-----o---o-----o addafaf d8f6b34 \ / .---o---o---. \ / .---*---. 3815f42 where 5 independent development tracks, only two of which have changes in the specified paths since they forked. The last change for the other three development tracks was done by the same commit before they forked, and we were showing that three times. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-30 00:24:42 +01:00			`/* TMP_MARK is a general purpose flag that can`
			`* be used locally, but the user should clean`
			`* things up after it is done with them.`
			`*/`
			`for (parents = commit->parents;`
			`parents;`
			`parents = parents->next)`
			`parents->item->object.flags &= ~TMP_MARK;`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`}`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`if (revs.commit_format == CMIT_FMT_ONELINE)`
Introduce --pretty=oneline format. This introduces --pretty=oneline to git-rev-tree and git-rev-list commands to show only the first line of the commit message, without frills. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-09 07:15:40 +02:00			`putchar(' ');`
			`else`
			`putchar('\n');`

rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`if (revs.verbose_header) {`
pretty_print_commit: add different formats You can ask to print out "raw" format (full headers, full body), "medium" format (author and date, full body) or "short" format (author only, condensed body). Use "git-rev-list --pretty=short HEAD \| less -S" for an example. 2005-06-05 18:02:03 +02:00			`static char pretty_header[16384];`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`pretty_print_commit(revs.commit_format, commit, ~0,`
			`pretty_header, sizeof(pretty_header),`
fmt-patch: Support --attach This patch touches a couple of files, because it adds options to print a custom text just after the subject of a commit, and just after the diffstat. [jc: made "many dashes" used as the boundary leader into a single variable, to reduce the possibility of later tweaks to miscount the number of dashes to break it.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-20 15:40:29 +02:00			`revs.abbrev, NULL, NULL);`
pretty_print_commit: add different formats You can ask to print out "raw" format (full headers, full body), "medium" format (author and date, full body) or "short" format (author only, condensed body). Use "git-rev-list --pretty=short HEAD \| less -S" for an example. 2005-06-05 18:02:03 +02:00			`printf("%s%c", pretty_header, hdr_termination);`
Make rev-list flush the stdio buffers after each rev. We'd rather get the revisions in a slow but timely manner than have to wait for them. 2005-07-05 01:36:48 +02:00			`}`
			`fflush(stdout);`
Some more memory leak avoidance This is really the dregs of my effort to not waste memory in git-rev-list, and makes barely one percent of a difference in the memory footprint, but hey, it's also a pretty small patch. It discards the parent lists and the commit buffer after the commit has been shown by git-rev-list (and "git log" - which already did the commit buffer part), and frees the commit list entry that was used by the revision walker. The big win would be to get rid of the "refs" pointer in the object structure (another 5%), because it's only used by fsck. That would require some pretty major surgery to fsck, though, so I'm timid and did the less interesting but much easier part instead. This (percentually) makes a bigger difference to "git log" and friends, since those are walking _just_ commits, and thus the list entries tend to be a bigger percentage of the memory use. But the "list all objects" case does improve too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 03:47:58 +02:00			`if (commit->parents) {`
			`free_commit_list(commit->parents);`
			`commit->parents = NULL;`
			`}`
			`if (commit->buffer) {`
			`free(commit->buffer);`
			`commit->buffer = NULL;`
			`}`
[PATCH] Modify git-rev-list to linearise the commit history in merge order. This patch linearises the GIT commit history graph into merge order which is defined by invariants specified in Documentation/git-rev-list.txt. The linearisation produced by this patch is superior in an objective sense to that produced by the existing git-rev-list implementation in that the linearisation produced is guaranteed to have the minimum number of discontinuities, where a discontinuity is defined as an adjacent pair of commits in the output list which are not related in a direct child-parent relationship. With this patch a graph like this: a4 --- \| \ \ \| b4 \| \|/ \| \| a3 \| \| \| \| \| a2 \| \| \| \| c3 \| \| \| \| \| c2 \| b3 \| \| \| /\| \| b2 \| \| \| c1 \| \| / \| b1 a1 \| \| \| a0 \| \| / root Sorts like this: = a4 \| c3 \| c2 \| c1 ^ b4 \| b3 \| b2 \| b1 ^ a3 \| a2 \| a1 \| a0 = root Instead of this: = a4 \| c3 ^ b4 \| a3 ^ c2 ^ b3 ^ a2 ^ b2 ^ c1 ^ a1 ^ b1 ^ a0 = root A test script, t/t6000-rev-list.sh, includes a test which demonstrates that the linearisation produced by --merge-order has less discontinuities than the linearisation produced by git-rev-list without the --merge-order flag specified. To see this, do the following: cd t ./t6000-rev-list.sh cd trash cat actual-default-order cat actual-merge-order The existing behaviour of git-rev-list is preserved, by default. To obtain the modified behaviour, specify --merge-order or --merge-order --show-breaks on the command line. This version of the patch has been tested on the git repository and also on the linux-2.6 repository and has reasonable performance on both - ~50-100% slower than the original algorithm. This version of the patch has incorporated a functional equivalent of the Linus' output limiting algorithm into the merge-order algorithm itself. This operates per the notes associated with Linus' commit 337cb3fb8da45f10fe9a0c3cf571600f55ead2ce. This version has incorporated Linus' feedback regarding proposed changes to rev-list.c. (see: [PATCH] Factor out filtering in rev-list.c) This version has improved the way sort_first_epoch marks commits as uninteresting. For more details about this change, refer to Documentation/git-rev-list.txt and http://blackcubes.dyndns.org/epoch/. Signed-off-by: Jon Seymour <jon.seymour@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-06 17:39:40 +02:00			`}`

Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`static void process_blob(struct blob *blob,`
			`struct object_array *p,`
			`struct name_path *path,`
			`const char *name)`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`{`
			`struct object *obj = &blob->object;`

First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`if (!revs.blob_objects)`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`return;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`return;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`obj->flags \|= SEEN;`
Fix memory leak in "git rev-list --objects" Martin Langhoff points out that "git repack -a" ends up using up a lot of memory for big archives, and that git cvsimport probably should do only incremental repacks in order to avoid having repacking flush all the caches. The big majority of the memory usage of repacking is from git rev-list tracking all objects, and this patch should go a long way in avoiding the excessive memory usage: the bulk of it was due to the object names being leaked from the tree parser. For the historic Linux kernel archive, this simple patch does: Before: /usr/bin/time git-rev-list --all --objects > /dev/null 72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+125376minor)pagefaults 0swaps After: /usr/bin/time git-rev-list --all --objects > /dev/null 75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+43921minor)pagefaults 0swaps where we do end up wasting a bit of time on some extra strdup()s (which could be avoided, but that would require tracking where the pathnames came from), but we avoid a lot of memory usage. Minor page faults track maximum RSS very closely (each page fault maps in one page into memory), so the reduction from 125376 page faults to 43921 means a rough reduction of VM footprint from almost half a gigabyte to about a third of that. Those numbers were also double-checked by looking at "top" while the process was running. (Side note: at least part of the remaining VM footprint is the mapping of the 177MB pack-file, so the remaining memory use is at least partly "well behaved" from a project caching perspective). For the current git archive itself, the memory usage for a "--all --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 9MB), so the reduction seems to hold for much smaller projects too. For regular "git-rev-list" usage (ie without the "--objects" flag) this patch has no impact. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-28 20:37:23 +02:00			`name = strdup(name);`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`add_object(obj, p, path, name);`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`}`

Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`static void process_tree(struct tree *tree,`
			`struct object_array *p,`
			`struct name_path *path,`
			`const char *name)`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`{`
			`struct object *obj = &tree->object;`
Remove "tree->entries" tree-entry list from tree parser Instead, just use the tree buffer directly, and use the tree-walk infrastructure to walk the buffers instead of the tree-entry list. The tree-entry list is inefficient, and generates tons of small allocations for no good reason. The tree-walk infrastructure is generally no harder to use than following a linked list, and allows us to do most tree parsing in-place. Some programs still use the old tree-entry lists, and are a bit painful to convert without major surgery. For them we have a helper function that creates a temporary tree-entry list on demand. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-29 21:18:33 +02:00			`struct tree_desc desc;`
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`struct name_entry entry;`
rev-list --objects: use full pathname to help hashing. This helps to group the same files from different revs together, while spreading files with the same basename in different directories, to help pack-object. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`struct name_path me;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`if (!revs.tree_objects)`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`return;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`if (obj->flags & (UNINTERESTING \| SEEN))`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`return;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`if (parse_tree(tree) < 0)`
			`die("bad tree object %s", sha1_to_hex(obj->sha1));`
			`obj->flags \|= SEEN;`
Fix memory leak in "git rev-list --objects" Martin Langhoff points out that "git repack -a" ends up using up a lot of memory for big archives, and that git cvsimport probably should do only incremental repacks in order to avoid having repacking flush all the caches. The big majority of the memory usage of repacking is from git rev-list tracking all objects, and this patch should go a long way in avoiding the excessive memory usage: the bulk of it was due to the object names being leaked from the tree parser. For the historic Linux kernel archive, this simple patch does: Before: /usr/bin/time git-rev-list --all --objects > /dev/null 72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+125376minor)pagefaults 0swaps After: /usr/bin/time git-rev-list --all --objects > /dev/null 75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+43921minor)pagefaults 0swaps where we do end up wasting a bit of time on some extra strdup()s (which could be avoided, but that would require tracking where the pathnames came from), but we avoid a lot of memory usage. Minor page faults track maximum RSS very closely (each page fault maps in one page into memory), so the reduction from 125376 page faults to 43921 means a rough reduction of VM footprint from almost half a gigabyte to about a third of that. Those numbers were also double-checked by looking at "top" while the process was running. (Side note: at least part of the remaining VM footprint is the mapping of the 177MB pack-file, so the remaining memory use is at least partly "well behaved" from a project caching perspective). For the current git archive itself, the memory usage for a "--all --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 9MB), so the reduction seems to hold for much smaller projects too. For regular "git-rev-list" usage (ie without the "--objects" flag) this patch has no impact. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-28 20:37:23 +02:00			`name = strdup(name);`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`add_object(obj, p, path, name);`
rev-list --objects: use full pathname to help hashing. This helps to group the same files from different revs together, while spreading files with the same basename in different directories, to help pack-object. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-23 07:10:24 +01:00			`me.up = path;`
			`me.elem = name;`
			`me.elem_len = strlen(name);`
Remove "tree->entries" tree-entry list from tree parser Instead, just use the tree buffer directly, and use the tree-walk infrastructure to walk the buffers instead of the tree-entry list. The tree-entry list is inefficient, and generates tons of small allocations for no good reason. The tree-walk infrastructure is generally no harder to use than following a linked list, and allows us to do most tree parsing in-place. Some programs still use the old tree-entry lists, and are a bit painful to convert without major surgery. For them we have a helper function that creates a temporary tree-entry list on demand. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-29 21:18:33 +02:00
			`desc.buf = tree->buffer;`
			`desc.size = tree->size;`

tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`while (tree_entry(&desc, &entry)) {`
			`if (S_ISDIR(entry.mode))`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`process_tree(lookup_tree(entry.sha1), p, &me, entry.path);`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`else`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`process_blob(lookup_blob(entry.sha1), p, &me, entry.path);`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`}`
Make "struct tree" contain the pointer to the tree buffer This allows us to avoid allocating information for names etc, because we can just use the information from the tree buffer directly. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-29 21:16:12 +02:00			`free(tree->buffer);`
			`tree->buffer = NULL;`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`}`

git-rev-list libification: rev-list walking This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info revs); struct commit get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 20:24:00 +01:00			`static void show_commit_list(struct rev_info *revs)`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`{`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`int i;`
git-rev-list libification: rev-list walking This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info revs); struct commit get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 20:24:00 +01:00			`struct commit *commit;`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`struct object_array objects = { 0, 0, NULL };`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00
git-rev-list libification: rev-list walking This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info revs); struct commit get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 20:24:00 +01:00			`while ((commit = get_revision(revs)) != NULL) {`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`process_tree(commit->tree, &objects, NULL, "");`
Rip out merge-order and make "git log <paths>..." work again. Well, assuming breaking --merge-order is fine, here's a patch (on top of the other ones) that makes git log <filename> actually work, as far as I can tell. I didn't add the logic for --before/--after flags, but that should be pretty trivial, and is independent of this anyway. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-01 00:07:20 +01:00			`show_commit(commit);`
git-rev-list: factor out the commit printing from "main()" Functions that do many things are bad. We should basically just parse the arguments in main(). We're not quite there yet, but it's a step in the right direction. 2005-06-02 18:19:53 +02:00			`}`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`for (i = 0; i < revs->pending.nr; i++) {`
			`struct object_array_entry *pending = revs->pending.objects + i;`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`struct object *obj = pending->item;`
			`const char *name = pending->name;`
			`if (obj->flags & (UNINTERESTING \| SEEN))`
			`continue;`
Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-15 01:45:13 +02:00			`if (obj->type == TYPE_TAG) {`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`obj->flags \|= SEEN;`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`add_object_array(obj, name, &objects);`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`continue;`
			`}`
Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-15 01:45:13 +02:00			`if (obj->type == TYPE_TREE) {`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`process_tree((struct tree *)obj, &objects, NULL, name);`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`continue;`
			`}`
Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-15 01:45:13 +02:00			`if (obj->type == TYPE_BLOB) {`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`process_blob((struct blob *)obj, &objects, NULL, name);`
Teach git-rev-list about non-commit objects Now you can give git-rev-list tags, trees and blobs, and it will do the proper reachability for them all. Knock wood. Of course, you need the "--objects" flag to do anything but plain commits. 2005-06-29 20:30:24 +02:00			`continue;`
			`}`
			`die("unknown pending object %s (%s)", sha1_to_hex(obj->sha1), name);`
			`}`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`for (i = 0; i < objects.nr; i++) {`
			`struct object_array_entry *p = objects.objects + i;`

rev-list.c: fix non-grammatical comments. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 10:27:02 +01:00			`/* An object with name "foo\n0000000..." can be used to`
			`* confuse downstream git-pack-objects very badly.`
Fix minor DOS in rev-list. A carefully crafted pathname can be used to disrupt downstream git-pack-objects that uses 'git-rev-list --objects' output. Prevent this. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-03 02:29:21 +02:00			`*/`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`const char *ep = strchr(p->name, '\n');`
Fix minor DOS in rev-list. A carefully crafted pathname can be used to disrupt downstream git-pack-objects that uses 'git-rev-list --objects' output. Prevent this. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-03 02:29:21 +02:00			`if (ep) {`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`printf("%s %.*s\n", sha1_to_hex(p->item->sha1),`
			`(int) (ep - p->name),`
			`p->name);`
Fix minor DOS in rev-list. A carefully crafted pathname can be used to disrupt downstream git-pack-objects that uses 'git-rev-list --objects' output. Prevent this. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-03 02:29:21 +02:00			`}`
			`else`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`printf("%s %s\n", sha1_to_hex(p->item->sha1), p->name);`
git-rev-list: add option to list all objects (not just commits) When you do git-rev-list --objects $(git-rev-parse HEAD^..HEAD) it now lists not only the "commit difference" between the parent of HEAD and HEAD itself (which is normally just the parent, but in the case of a merge will be all the newly merged commits), but also all the new tree and blob objects that weren't in the original. NOTE! It doesn't walk all the way to the root, so it doesn't do a full object search in the full old history. Instead, it will only look as far back in the history as it needs to resolve the commits. Thus, if the commit reverts a blob (or tree) back to a state much further back in history, we may end up listing some blobs (or trees) as "new" even though they exist further back. Regardless, the list of objects will be a superset (usually exact) list of objects needed to go from the beginning commit to ending commit. As a particularly obvious special case, git-rev-list --objects HEAD will end up listing every single object that is reachable from the HEAD commit. Side note: the objects are sorted by "recency", with commits first. 2005-06-25 07:56:58 +02:00			`}`
			`}`

git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`/*`
			`* This is a truly stupid algorithm, but it's only`
			`* used for bisection, and we just don't care enough.`
			`*`
			`* We care just barely enough to avoid recursing for`
			`* non-merge entries.`
			`*/`
			`static int count_distance(struct commit_list *entry)`
			`{`
			`int nr = 0;`

			`while (entry) {`
			`struct commit *commit = entry->item;`
			`struct commit_list *p;`

			`if (commit->object.flags & (UNINTERESTING \| COUNTED))`
			`break;`
rev-lib: Make it easy to do rename tracking (take 2) prune_fn in the rev_info structure is called in place of try_to_simplify_commit. This makes it possible to do rename tracking with a custom try_to_simplify_commit-like function. This commit also introduces init_revisions which initialises the rev_info structure with default values. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-10 10:21:39 +01:00			`if (!revs.prune_fn \|\| (commit->object.flags & TREECHANGE))`
bisect: limit the searchspace by pathspecs It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-27 20:32:03 +01:00			`nr++;`
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`commit->object.flags \|= COUNTED;`
			`p = commit->parents;`
			`entry = p;`
			`if (p) {`
			`p = p->next;`
			`while (p) {`
			`nr += count_distance(p);`
			`p = p->next;`
			`}`
			`}`
			`}`
bisect: limit the searchspace by pathspecs It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-27 20:32:03 +01:00
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`return nr;`
			`}`

Avoid warning about function without return. Strangely, this warning only shows up when not compiling with "-O2", which is why I didn't see it originally. 2005-06-19 05:02:49 +02:00			`static void clear_distance(struct commit_list *list)`
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`{`
			`while (list) {`
			`struct commit *commit = list->item;`
			`commit->object.flags &= ~COUNTED;`
			`list = list->next;`
			`}`
			`}`

			`static struct commit_list find_bisection(struct commit_list list)`
			`{`
			`int nr, closest;`
			`struct commit_list p, best;`

			`nr = 0;`
			`p = list;`
			`while (p) {`
rev-lib: Make it easy to do rename tracking (take 2) prune_fn in the rev_info structure is called in place of try_to_simplify_commit. This makes it possible to do rename tracking with a custom try_to_simplify_commit-like function. This commit also introduces init_revisions which initialises the rev_info structure with default values. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-10 10:21:39 +01:00			`if (!revs.prune_fn \|\| (p->item->object.flags & TREECHANGE))`
bisect: limit the searchspace by pathspecs It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-27 20:32:03 +01:00			`nr++;`
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`p = p->next;`
			`}`
			`closest = 0;`
			`best = list;`

bisect: limit the searchspace by pathspecs It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-27 20:32:03 +01:00			`for (p = list; p; p = p->next) {`
			`int distance;`

rev-lib: Make it easy to do rename tracking (take 2) prune_fn in the rev_info structure is called in place of try_to_simplify_commit. This makes it possible to do rename tracking with a custom try_to_simplify_commit-like function. This commit also introduces init_revisions which initialises the rev_info structure with default values. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-10 10:21:39 +01:00			`if (revs.prune_fn && !(p->item->object.flags & TREECHANGE))`
bisect: limit the searchspace by pathspecs It was surprisingly easy to do. git bisect start <pathspec> followed by all the normal "git bisect good/bad" stuff. Almost totally untested, and I guarantee that if your pathnames have spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't know how to fix that, my shell programming isn't good enough. This involves small changes to make "git-rev-list --bisect" work in the presense of a pathspec limiter, and then truly trivial (and that's the broken part) changes to make "git bisect" save away and use the pathspec. I tried one bisection, and a "git bisect visualize", and it all looked correct. But hey, don't be surprised if it has problems. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-27 20:32:03 +01:00			`continue;`

			`distance = count_distance(p);`
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`clear_distance(list);`
			`if (nr - distance < distance)`
			`distance = nr - distance;`
			`if (distance > closest) {`
			`best = p;`
			`closest = distance;`
			`}`
			`}`
			`if (best)`
			`best->next = NULL;`
			`return best;`
			`}`

rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`static void mark_edge_parents_uninteresting(struct commit *commit)`
			`{`
			`struct commit_list *parents;`

			`for (parents = commit->parents; parents; parents = parents->next) {`
			`struct commit *parent = parents->item;`
			`if (!(parent->object.flags & UNINTERESTING))`
			`continue;`
			`mark_tree_uninteresting(parent->tree);`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`if (revs.edge_hint && !(parent->object.flags & SHOWN)) {`
rev-list --objects-edge: remove duplicated edge commit output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:44:15 +01:00			`parent->object.flags \|= SHOWN;`
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`printf("-%s\n", sha1_to_hex(parent->object.sha1));`
rev-list --objects-edge: remove duplicated edge commit output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-24 08:44:15 +01:00			`}`
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`}`
			`}`

[PATCH] Re-organize "git-rev-list --objects" logic The logic to calculate the full object list used to be very inter-twined with the logic that looked up the commits. For no good reason - it's actually a lot simpler to just do that logic as a separate pass. This improves performance a bit, and uses slightly less memory in my tests, but more importantly it makes the code simpler to work with and follow what it does. The performance win is less than I had hoped for, but I get: Before: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 13.64user 0.42system 0:14.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+47947minor)pagefaults 0swaps 58945 After: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 11.80user 0.36system 0:12.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+42684minor)pagefaults 0swaps 58945 ie it improved by 2 seconds, and took a 5000+ fewer pages (hey, that's 20MB out of 174MB to go). And got the same number of objects (in theory, the more expensive one might find some more shared objects to avoid. In practice it obviously doesn't). I know how to make it use _lots_ less memory, which will probably speed it up. But that's for another time, and I'd prefer to see this go in first. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-16 00:14:29 +02:00			`static void mark_edges_uninteresting(struct commit_list *list)`
			`{`
			`for ( ; list; list = list->next) {`
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`struct commit *commit = list->item;`
[PATCH] Re-organize "git-rev-list --objects" logic The logic to calculate the full object list used to be very inter-twined with the logic that looked up the commits. For no good reason - it's actually a lot simpler to just do that logic as a separate pass. This improves performance a bit, and uses slightly less memory in my tests, but more importantly it makes the code simpler to work with and follow what it does. The performance win is less than I had hoped for, but I get: Before: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 13.64user 0.42system 0:14.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+47947minor)pagefaults 0swaps 58945 After: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 11.80user 0.36system 0:12.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+42684minor)pagefaults 0swaps 58945 ie it improved by 2 seconds, and took a 5000+ fewer pages (hey, that's 20MB out of 174MB to go). And got the same number of objects (in theory, the more expensive one might find some more shared objects to avoid. In practice it obviously doesn't). I know how to make it use _lots_ less memory, which will probably speed it up. But that's for another time, and I'd prefer to see this go in first. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-16 00:14:29 +02:00
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`if (commit->object.flags & UNINTERESTING) {`
			`mark_tree_uninteresting(commit->tree);`
			`continue;`
[PATCH] Re-organize "git-rev-list --objects" logic The logic to calculate the full object list used to be very inter-twined with the logic that looked up the commits. For no good reason - it's actually a lot simpler to just do that logic as a separate pass. This improves performance a bit, and uses slightly less memory in my tests, but more importantly it makes the code simpler to work with and follow what it does. The performance win is less than I had hoped for, but I get: Before: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 13.64user 0.42system 0:14.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+47947minor)pagefaults 0swaps 58945 After: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 11.80user 0.36system 0:12.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+42684minor)pagefaults 0swaps 58945 ie it improved by 2 seconds, and took a 5000+ fewer pages (hey, that's 20MB out of 174MB to go). And got the same number of objects (in theory, the more expensive one might find some more shared objects to avoid. In practice it obviously doesn't). I know how to make it use _lots_ less memory, which will probably speed it up. But that's for another time, and I'd prefer to see this go in first. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-16 00:14:29 +02:00			`}`
rev-list --objects-edge This new flag is similar to --objects, but causes rev-list to show list of "uninteresting" commits that appear on the edge commit prefixed with '-'. Downstream pack-objects will be changed to take these as hints to use the trees and blobs contained with them as base objects of resulting pack, producing an incomplete (not self-contained) pack. Such a pack cannot be used in .git/objects/pack (it is prevented by git-index-pack erroring out if it is fed to git-fetch-pack -k or git-clone-pack), but would be useful when transferring only small changes to huge blobs. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-19 12:32:31 +01:00			`mark_edge_parents_uninteresting(commit);`
[PATCH] Re-organize "git-rev-list --objects" logic The logic to calculate the full object list used to be very inter-twined with the logic that looked up the commits. For no good reason - it's actually a lot simpler to just do that logic as a separate pass. This improves performance a bit, and uses slightly less memory in my tests, but more importantly it makes the code simpler to work with and follow what it does. The performance win is less than I had hoped for, but I get: Before: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 13.64user 0.42system 0:14.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+47947minor)pagefaults 0swaps 58945 After: [torvalds@g5 linux]$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD \| wc -l 11.80user 0.36system 0:12.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+42684minor)pagefaults 0swaps 58945 ie it improved by 2 seconds, and took a 5000+ fewer pages (hey, that's 20MB out of 174MB to go). And got the same number of objects (in theory, the more expensive one might find some more shared objects to avoid. In practice it obviously doesn't). I know how to make it use _lots_ less memory, which will probably speed it up. But that's for another time, and I'd prefer to see this go in first. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-16 00:14:29 +02:00			`}`
			`}`

Make "git rev-list" be a builtin This was surprisingly easy. The diff is truly minimal: rename "main()" to "cmd_rev_list()" in rev-list.c, and rename the whole file to reflect its new built-in status. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 23:19:20 +02:00			`int cmd_rev_list(int argc, const char argv, char envp)`
Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00			`{`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`struct commit_list *list;`
Splitting rev-list into revisions lib, end of beginning. This makes the rewrite easier to validate in that revision flag parsing and warlking part are now all in rev_info structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-27 17:54:36 +01:00			`int i;`
Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`init_revisions(&revs);`
			`revs.abbrev = 0;`
			`revs.commit_format = CMIT_FMT_UNSPECIFIED;`
git-rev-list libification: rev-list walking This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info revs); struct commit get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 20:24:00 +01:00			`argc = setup_revisions(argc, argv, &revs, NULL);`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00
[PATCH] control/limit output of git-rev-list gitweb.cgi's default view is the log of the last day and git-rev-list can stop crawling the whole repo if we have all our data to display in the browser. Also the rss-feed query needs only the last 20 items. This will speeds up these queries dramatically. usage: rev-list [OPTION] commit-id --max-count=nr --max-age=epoch --min-age=epoch Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 10:00:11 +02:00			`for (i = 1 ; i < argc; i++) {`
Teach git-rev-list to follow just a specified set of files This is the first cut at a git-rev-list that knows to ignore commits that don't change a certain file (or set of files). NOTE! For now it only prunes _merge_ commits, and follows the parent where there are no differences in the set of files specified. In the long run, I'd like to make it re-write the straight-line history too, but for now the merge simplification is much more fundamentally important (the rewriting of straight-line history is largely a separate simplification phase, but the merge simplification needs to happen early if we want to optimize away unnecessary commit parsing). If all parents of a merge change some of the files, the merge is left as is, so the end result is in no way guaranteed to be a linear history, but it will often be a lot /more/ linear than the full tree, since it prunes out parents that didn't matter for that set of files. As an example from the current kernel: [torvalds@g5 linux]$ git-rev-list HEAD \| wc -l 9885 [torvalds@g5 linux]$ git-rev-list HEAD -- Makefile \| wc -l 4084 [torvalds@g5 linux]$ git-rev-list HEAD -- drivers/usb \| wc -l 5206 and you can also use 'gitk' to more visually see the pruning of the history tree, with something like gitk -- drivers/usb showing a simplified history that tries to follow the first parent in a merge that is the parent that fully defines drivers/usb/. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:25:09 +02:00			`const char *arg = argv[i];`
[PATCH] control/limit output of git-rev-list gitweb.cgi's default view is the log of the last day and git-rev-list can stop crawling the whole repo if we have all our data to display in the browser. Also the rss-feed query needs only the last 20 items. This will speeds up these queries dramatically. usage: rev-list [OPTION] commit-id --max-count=nr --max-age=epoch --min-age=epoch Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 10:00:11 +02:00
git-rev-list: add "end" commit and "--header" flag The "end" commit is just faking it right now, it's sorting things purely by date, so this is _not_ a reachability analysis. Some day. The "--header" flag causes the commit message to be printed out, with a NUL character separator after it for parseability. This allows you to do things like use "grep -z" to grep for certain authors etc. 2005-05-26 03:29:09 +02:00			`if (!strcmp(arg, "--header")) {`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`revs.verbose_header = 1;`
git-rev-list: add "--pretty" command line option That pretty-prints the resulting commit messages, so git-rev-list --pretty HEAD v2.6.12-rc5 \| less -S basically ends up being a log of the changes between -rc5 and current head. It uses the pretty-printing helper function I just extracted from diff-tree.c. 2005-06-01 17:42:22 +02:00			`continue;`
			`}`
rev-list --timestamp This prefixes the raw commit timestamp to the output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-22 09:22:00 +01:00			`if (!strcmp(arg, "--timestamp")) {`
			`show_timestamp = 1;`
			`continue;`
			`}`
git-rev-list: add "--bisect" flag to find the "halfway" point This is useful for doing binary searching for problems. You start with a known good and known bad point, and you then test the "halfway" point in between: git-rev-list --bisect bad ^good and you test that. If that one tests good, you now still have a known bad case, but two known good points, and you can bisect again: git-rev-list --bisect bad ^good1 ^good2 and test that point. If that point is bad, you now use that as your known-bad starting point: git-rev-list --bisect newbad ^good1 ^good2 and basically at every iteration you shrink your list of commits by half: you're binary searching for the point where the troubles started, even though there isn't a nice linear ordering. 2005-06-18 07:54:50 +02:00			`if (!strcmp(arg, "--bisect")) {`
			`bisect_list = 1;`
			`continue;`
			`}`
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`usage(rev_list_usage);`
git-rev-list: add "end" commit and "--header" flag The "end" commit is just faking it right now, it's sorting things purely by date, so this is _not_ a reachability analysis. Some day. The "--header" flag causes the commit message to be printed out, with a NUL character separator after it for parseability. This allows you to do things like use "grep -z" to grep for certain authors etc. 2005-05-26 03:29:09 +02:00
[PATCH] control/limit output of git-rev-list gitweb.cgi's default view is the log of the last day and git-rev-list can stop crawling the whole repo if we have all our data to display in the browser. Also the rss-feed query needs only the last 20 items. This will speeds up these queries dramatically. usage: rev-list [OPTION] commit-id --max-count=nr --max-age=epoch --min-age=epoch Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 10:00:11 +02:00			`}`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {`
			`/* The command line has a --pretty */`
			`hdr_termination = '\n';`
			`if (revs.commit_format == CMIT_FMT_ONELINE)`
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 20:59:32 +02:00			`header_prefix = "";`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`else`
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 20:59:32 +02:00			`header_prefix = "commit ";`
rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`}`
rev-list --header: output format fix Initial fix prepared by Johannes, but I did it slightly differently. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-17 21:42:36 +02:00			`else if (revs.verbose_header)`
			`/* Only --header was specified */`
			`revs.commit_format = CMIT_FMT_RAW;`
[PATCH] control/limit output of git-rev-list gitweb.cgi's default view is the log of the last day and git-rev-list can stop crawling the whole repo if we have all our data to display in the browser. Also the rss-feed query needs only the last 20 items. This will speeds up these queries dramatically. usage: rev-list [OPTION] commit-id --max-count=nr --max-age=epoch --min-age=epoch Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 10:00:11 +02:00
First cut at libifying revlist generation This really just splits things up partially, and creates the interface to set things up by parsing the command line. No real code changes so far, although the parsing of filenames is a bit stricter. In particular, if there is a "--", then we do not accept any filenames before it, and if there isn't any "--", then we check that _all_ paths listed are valid, not just the first one. The new argument parsing automatically also gives us "--default" and "--not" handling as in git-rev-parse. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-26 01:19:46 +01:00			`list = revs.commits;`

Fix up rev-list option parsing. rev-list does not take diff options, so barf after seeing some. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-15 07:43:34 +02:00			`if ((!list &&`
			`(!(revs.tag_objects\|\|revs.tree_objects\|\|revs.blob_objects) &&`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`!revs.pending.nr)) \|\|`
Fix up rev-list option parsing. rev-list does not take diff options, so barf after seeing some. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-15 07:43:34 +02:00			`revs.diff)`
git-rev-list: make --dense the default (and introduce "--sparse") This actually does three things: - make "--dense" the default for git-rev-list. Since dense is a no-op if no filenames are given, this doesn't actually change any historical behaviour, but it's logically the right default (if we want to prune on filenames, do it fully. The sparse "merge-only" thing may be useful, but it's not what you'd normally expect) - make "git-rev-parse" show the default revision control before it shows any pathnames. This was a real bug, but nobody would ever have noticed, because the default thing tends to only make sense for git-rev-list, and git-rev-list didn't use to take pathnames. - it changes "git-rev-list" to match the other commands that take a mix of revisions and filenames - it no longer requires the "--" before filenames (although you still need to do it if a filename could be confused with a revision name, eg "gitk" in the git archive) This all just makes for much more pleasant and obvous usage. Just doing a gitk t/ does the obvious thing: it will show the history as it concerns the "t/" subdirectory. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-26 00:24:55 +02:00			`usage(rev_list_usage);`

rev-list option parser fix. The big option parser unification broke rev-list the big way; this makes it use options from the parsed revs structure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-16 08:48:27 +02:00			`save_commit_buffer = revs.verbose_header;`
rev-list: memory usage reduction. We do not need to track object refs, neither we need to save commit unless we are doing verbose header. A lot of traversal happens inside prepare_revision_walk() these days so setting things up before calling that function is necessary. Signed-off-by: Junio C Hamano <junkio@cox.net> Acked-by: Linus Torvalds <torvalds@osdl.org> 2006-03-29 03:28:04 +02:00			`track_object_refs = 0;`
rev-list --bisect: limit list before bisecting. I noticed bisect does not work well without both good and bad. Running this script in git.git repository would give you quite different results: #!/bin/sh initial=e83c5163316f89bfbde7d9ab23ca2e25604af290 mid0=`git rev-list --bisect ^$initial --all` git rev-list $mid0 \| wc -l git rev-list ^$mid0 --all \| wc -l mid1=`git rev-list --bisect --all` git rev-list $mid1 \| wc -l git rev-list ^$mid1 --all \| wc -l The $initial commit is the very first commit you made. The first midpoint bisects things evenly as designed, but the latter does not. The reason I got interested in this was because I was wondering if something like the following would help people converting a huge repository from foreign SCM, or preparing a repository to be fetched over plain dumb HTTP only: #!/bin/sh N=4 P=.git/objects/pack bottom= while test 0 \< $N do N=$((N-1)) if test -z "$bottom" then newbottom=`git rev-list --bisect --all` else newbottom=`git rev-list --bisect ^$bottom --all` fi if test -z "$bottom" then rev_list="$newbottom" elif test 0 = $N then rev_list="^$bottom --all" else rev_list="^$bottom $newbottom" fi p=$(git rev-list --unpacked --objects $rev_list \| git pack-objects $P/pack) git show-index <$P/pack-$p.idx \| wc -l bottom=$newbottom done The idea is to pack older half of the history to one pack, then older half of the remaining history to another, to continue a few times, using finer granularity as we get closer to the tip. This may not matter, since for a truly huge history, running bisect number of times could be quite time consuming, and we might be better off running "git rev-list --all" once into a temporary file, and manually pick cut-off points from the resulting list of commits. After all we are talking about "approximately half" for such an usage, and older history does not matter much. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-15 00:57:32 +02:00			`if (bisect_list)`
			`revs.limited = 1;`
rev-list: memory usage reduction. We do not need to track object refs, neither we need to save commit unless we are doing verbose header. A lot of traversal happens inside prepare_revision_walk() these days so setting things up before calling that function is necessary. Signed-off-by: Junio C Hamano <junkio@cox.net> Acked-by: Linus Torvalds <torvalds@osdl.org> 2006-03-29 03:28:04 +02:00
git-rev-list libification: rev-list walking This actually moves the "meat" of the revision walking from rev-list.c to the new library code in revision.h. It introduces the new functions void prepare_revision_walk(struct rev_info revs); struct commit get_revision(struct rev_info *revs); to prepare and then walk the revisions that we have. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 20:24:00 +01:00			`prepare_revision_walk(&revs);`
			`if (revs.tree_objects)`
			`mark_edges_uninteresting(revs.commits);`

			`if (bisect_list)`
			`revs.commits = find_bisection(revs.commits);`
git-rev-list: make --dense the default (and introduce "--sparse") This actually does three things: - make "--dense" the default for git-rev-list. Since dense is a no-op if no filenames are given, this doesn't actually change any historical behaviour, but it's logically the right default (if we want to prune on filenames, do it fully. The sparse "merge-only" thing may be useful, but it's not what you'd normally expect) - make "git-rev-parse" show the default revision control before it shows any pathnames. This was a real bug, but nobody would ever have noticed, because the default thing tends to only make sense for git-rev-list, and git-rev-list didn't use to take pathnames. - it changes "git-rev-list" to match the other commands that take a mix of revisions and filenames - it no longer requires the "--" before filenames (although you still need to do it if a filename could be confused with a revision name, eg "gitk" in the git archive) This all just makes for much more pleasant and obvous usage. Just doing a gitk t/ does the obvious thing: it will show the history as it concerns the "t/" subdirectory. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-26 00:24:55 +02:00
Rip out merge-order and make "git log <paths>..." work again. Well, assuming breaking --merge-order is fine, here's a patch (on top of the other ones) that makes git log <filename> actually work, as far as I can tell. I didn't add the logic for --before/--after flags, but that should be pretty trivial, and is independent of this anyway. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-01 00:07:20 +01:00			`show_commit_list(&revs);`
git-rev-list: use proper lazy reachability analysis This mean sthat you can give a beginning/end pair to git-rev-list, and it will show all entries that are reachable from the beginning but not the end. For example git-rev-list v2.6.12-rc5 v2.6.12-rc4 shows all commits that are in -rc5 but are not in -rc4. 2005-05-31 03:46:32 +02:00
Add "rev-list" program that uses the new time-based commit listing. This is probably what you'd want to see for "git log". 2005-04-24 04:04:40 +02:00			`return 0;`
			`}`