mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-09 02:33:11 +01:00

463 lines

12 KiB

C

Raw Normal View History

Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`/*`
			`* Helper functions for tree diff generation`
			`*/`
			`#include "cache.h"`
			`#include "diff.h"`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`#include "diffcore.h"`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`#include "tree.h"`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`static char malloc_base(const char base, int baselen, const char *path, int pathlen)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
			`char *newbase = xmalloc(baselen + pathlen + 2);`
			`memcpy(newbase, base, baselen);`
			`memcpy(newbase + baselen, path, pathlen);`
			`memcpy(newbase + baselen + pathlen, "/", 2);`
			`return newbase;`
			`}`

Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`static char malloc_fullname(const char base, int baselen, const char *path, int pathlen)`
			`{`
			`char *fullname = xmalloc(baselen + pathlen + 1);`
			`memcpy(fullname, base, baselen);`
			`memcpy(fullname + baselen, path, pathlen);`
			`fullname[baselen + pathlen] = 0;`
			`return fullname;`
			`}`

Make show_entry void Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-14 22:39:27 +02:00			`static void show_entry(struct diff_options opt, const char prefix, struct tree_desc *desc,`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`const char *base, int baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`static int compare_tree_entry(struct tree_desc t1, struct tree_desc t2, const char base, int baselen, struct diff_options opt)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
			`unsigned mode1, mode2;`
			`const char path1, path2;`
			`const unsigned char sha1, sha2;`
			`int cmp, pathlen1, pathlen2;`
Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`char *fullname;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Make the "struct tree_desc" operations available to others We have operations to "extract" and "update" a "struct tree_desc", but we only used them in tree-diff.c and they were static to that file. But other tree traversal functions can use them to their advantage Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-31 23:10:56 +01:00			`sha1 = tree_entry_extract(t1, &path1, &mode1);`
			`sha2 = tree_entry_extract(t2, &path2, &mode2);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`pathlen1 = tree_entry_len(path1, sha1);`
			`pathlen2 = tree_entry_len(path2, sha2);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`cmp = base_name_compare(path1, pathlen1, mode1, path2, pathlen2, mode2);`
			`if (cmp < 0) {`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, "-", t1, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return -1;`
			`}`
			`if (cmp > 0) {`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, "+", t2, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return 1;`
			`}`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return 0;`

			`/*`
			`* If the filemode has changed to/from a directory from/to a regular`
			`* file, we need to consider it a remove and an add.`
			`*/`
			`if (S_ISDIR(mode1) != S_ISDIR(mode2)) {`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, "-", t1, base, baselen);`
			`show_entry(opt, "+", t2, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return 0;`
			`}`

Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (DIFF_OPT_TST(opt, RECURSIVE) && S_ISDIR(mode1)) {`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`int retval;`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`char *newbase = malloc_base(base, baselen, path1, pathlen1);`
Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`if (DIFF_OPT_TST(opt, TREE_IN_RECURSIVE)) {`
			`newbase[baselen + pathlen1] = 0;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`opt->change(opt, mode1, mode2,`
Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`sha1, sha2, newbase);`
			`newbase[baselen + pathlen1] = '/';`
			`}`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`retval = diff_tree_sha1(sha1, sha2, newbase, opt);`
			`free(newbase);`
			`return retval;`
			`}`

Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`fullname = malloc_fullname(base, baselen, path1, pathlen1);`
			`opt->change(opt, mode1, mode2, sha1, sha2, fullname);`
			`free(fullname);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return 0;`
			`}`

Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`/*`
			`* Is a tree entry interesting given the pathspec we have?`
			`*`
			`* Return:`
tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`* - 2 for "yes, and all subsequent entries will be"`
			`* - 1 for yes`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`* - zero for no`
			`* - negative for "no, and no subsequent entries will be either"`
			`*/`
			`static int tree_entry_interesting(struct tree_desc desc, const char base, int baselen, struct diff_options *opt)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
			`const char *path;`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`const unsigned char *sha1;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`unsigned mode;`
			`int i;`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`int pathlen;`
Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00			`int never_interesting = -1;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`if (!opt->nr_paths)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`return 1;`

Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`sha1 = tree_entry_extract(desc, &path, &mode);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`pathlen = tree_entry_len(path, sha1);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00			`for (i = 0; i < opt->nr_paths; i++) {`
tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`const char *match = opt->paths[i];`
			`int matchlen = opt->pathlens[i];`
tree-diff: avoid strncmp() If we already know that some of the pathspecs can match later entries in the tree we are looking at, we do not have to do more expensive strncmp() upfront before comparing the length of the match pattern and the path, as a path longer than the match pattern will not match it, and a path shorter than the match pattern will match only if the path is a directory-component wise prefix of the match pattern. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 20:34:46 +01:00			`int m = -1; /* signals that we haven't called strncmp() */`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
			`if (baselen >= matchlen) {`
			`/* If it doesn't match, move along... */`
			`if (strncmp(base, match, matchlen))`
			`continue;`

tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`/*`
			`* The base is a subdirectory of a path which`
			`* was specified, so all of them are interesting.`
			`*/`
			`return 2;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`

			`/* Does the base match? */`
			`if (strncmp(base, match, baselen))`
			`continue;`

			`match += baselen;`
			`matchlen -= baselen;`

tree-diff: avoid strncmp() If we already know that some of the pathspecs can match later entries in the tree we are looking at, we do not have to do more expensive strncmp() upfront before comparing the length of the match pattern and the path, as a path longer than the match pattern will not match it, and a path shorter than the match pattern will match only if the path is a directory-component wise prefix of the match pattern. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 20:34:46 +01:00			`if (never_interesting) {`
			`/*`
			`* We have not seen any match that sorts later`
			`* than the current path.`
			`*/`

			`/*`
			`* Does match sort strictly earlier than path`
			`* with their common parts?`
			`*/`
			`m = strncmp(match, path,`
			`(matchlen < pathlen) ? matchlen : pathlen);`
			`if (m < 0)`
			`continue;`
Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00
tree-diff: avoid strncmp() If we already know that some of the pathspecs can match later entries in the tree we are looking at, we do not have to do more expensive strncmp() upfront before comparing the length of the match pattern and the path, as a path longer than the match pattern will not match it, and a path shorter than the match pattern will match only if the path is a directory-component wise prefix of the match pattern. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 20:34:46 +01:00			`/*`
			`* If we come here even once, that means there is at`
			`* least one pathspec that would sort equal to or`
			`* later than the path we are currently looking at.`
			`* In other words, if we have never reached this point`
			`* after iterating all pathspecs, it means all`
			`* pathspecs are either outside of base, or inside the`
			`* base but sorts strictly earlier than the current`
			`* one. In either case, they will never match the`
			`* subsequent entries. In such a case, we initialized`
			`* the variable to -1 and that is what will be`
			`* returned, allowing the caller to terminate early.`
			`*/`
			`never_interesting = 0;`
			`}`
Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`if (pathlen > matchlen)`
			`continue;`

			`if (matchlen > pathlen) {`
			`if (match[pathlen] != '/')`
			`continue;`
			`if (!S_ISDIR(mode))`
			`continue;`
			`}`

tree-diff: avoid strncmp() If we already know that some of the pathspecs can match later entries in the tree we are looking at, we do not have to do more expensive strncmp() upfront before comparing the length of the match pattern and the path, as a path longer than the match pattern will not match it, and a path shorter than the match pattern will match only if the path is a directory-component wise prefix of the match pattern. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 20:34:46 +01:00			`if (m == -1)`
			`/*`
			`* we cheated and did not do strncmp(), so we do`
			`* that here.`
			`*/`
			`m = strncmp(match, path, pathlen);`

Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00			`/*`
			`* If common part matched earlier then it is a hit,`
			`* because we rejected the case where path is not a`
			`* leading directory and is shorter than match.`
			`*/`
			`if (!m)`
			`return 1;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`
Teach tree_entry_interesting() that the tree entries are sorted. When we are looking at a tree entry with pathspecs, if all the pathspecs sort strictly earlier than the entry we are currently looking at, there is no way later entries in the same tree would match our pathspecs, because the entries are sorted. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 17:51:47 +01:00			`return never_interesting; /* No matches */`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`

			`/* A whole sub-tree went away or appeared */`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`static void show_tree(struct diff_options opt, const char prefix, struct tree_desc desc, const char base, int baselen)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`int all_interesting = 0;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`while (desc->size) {`
tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`int show;`

			`if (all_interesting)`
			`show = 1;`
			`else {`
			`show = tree_entry_interesting(desc, base, baselen,`
			`opt);`
			`if (show == 2)`
			`all_interesting = 1;`
			`}`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`if (show < 0)`
			`break;`
			`if (show)`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, prefix, desc, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`update_tree_entry(desc);`
			`}`
			`}`

			`/* A file entry went away or appeared */`
Make show_entry void Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-14 22:39:27 +02:00			`static void show_entry(struct diff_options opt, const char prefix, struct tree_desc *desc,`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`const char *base, int baselen)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
			`unsigned mode;`
			`const char *path;`
Make the "struct tree_desc" operations available to others We have operations to "extract" and "update" a "struct tree_desc", but we only used them in tree-diff.c and they were static to that file. But other tree traversal functions can use them to their advantage Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-31 23:10:56 +01:00			`const unsigned char *sha1 = tree_entry_extract(desc, &path, &mode);`
Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`int pathlen = tree_entry_len(path, sha1);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (DIFF_OPT_TST(opt, RECURSIVE) && S_ISDIR(mode)) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`char *newbase = malloc_base(base, baselen, path, pathlen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`struct tree_desc inner;`
			`void *tree;`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`unsigned long size;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`tree = read_sha1_file(sha1, &type, &size);`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`if (!tree \|\| type != OBJ_TREE)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`die("corrupt tree sha %s", sha1_to_hex(sha1));`

Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&inner, tree, size);`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_tree(opt, prefix, &inner, newbase, baselen + 1 + pathlen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00
			`free(tree);`
			`free(newbase);`
Make show_entry void Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-14 22:39:27 +02:00			`} else {`
Fix buffer overflow in git diff If PATH_MAX on your system is smaller than a path stored, it may cause buffer overflow and stack corruption in diff_addremove() and diff_change() functions when running git-diff Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-16 16:54:02 +02:00			`char *fullname = malloc_fullname(base, baselen, path, pathlen);`
			`opt->add_remove(opt, prefix[0], mode, sha1, fullname);`
			`free(fullname);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`
			`}`

Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`static void skip_uninteresting(struct tree_desc t, const char base, int baselen, struct diff_options *opt)`
			`{`
tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`int all_interesting = 0;`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`while (t->size) {`
tree_entry_interesting(): allow it to say "everything is interesting" In addition to optimizing pathspecs that would never match, which was done earlier, this optimizes pathspecs that would always match (e.g. "arch/" while the traversal is already in "arch/i386/" hierarchy). This patch makes the worst case slightly more palatable, while improving average case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-22 01:00:27 +01:00			`int show;`

			`if (all_interesting)`
			`show = 1;`
			`else {`
			`show = tree_entry_interesting(t, base, baselen, opt);`
			`if (show == 2)`
			`all_interesting = 1;`
			`}`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`if (!show) {`
			`update_tree_entry(t);`
			`continue;`
			`}`
			`/* Skip it all? */`
			`if (show < 0)`
			`t->size = 0;`
			`return;`
			`}`
			`}`

Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`int diff_tree(struct tree_desc t1, struct tree_desc t2, const char base, struct diff_options opt)`
			`{`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`int baselen = strlen(base);`

Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`for (;;) {`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (DIFF_OPT_TST(opt, QUIET) && DIFF_OPT_TST(opt, HAS_CHANGES))`
Teach --quiet to diff backends. This teaches git-diff-files, git-diff-index and git-diff-tree backends to exit early under --quiet option. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-14 19:12:51 +01:00			`break;`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`if (opt->nr_paths) {`
			`skip_uninteresting(t1, base, baselen, opt);`
			`skip_uninteresting(t2, base, baselen, opt);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`
			`if (!t1->size) {`
Set up for better tree diff optimizations This is mainly just a cleanup patch, and sets up for later changes where the tree-diff.c "interesting()" function can return more than just a yes/no value. In particular, it should be quite possible to say "no subsequent entries in this tree can possibly be interesting any more", and thus allow the callers to short-circuit the tree entirely. In fact, changing the callers to do so is trivial, and is really all this patch really does, because changing "interesting()" itself to say that nothing further is going to be interesting is definitely more complicated, considering that we may have arbitrary pathspecs. But in cleaning up the callers, this actually fixes a potential small performance issue in diff_tree(): if the second tree has a lot of uninterestign crud in it, we would keep on doing the "is it interesting?" check on the first tree for each uninteresting entry in the second one. The answer is obviously not going to change, so that was just not helping. The new code is clearer and simpler and avoids this issue entirely. I also renamed "interesting()" to "tree_entry_interesting()", because I got frustrated by the fact that - we actually had another function called "interesting()" in another file, and I couldn't tell from the profiles which one was the one that mattered more. - when rewriting it to return a ternary value, you can't just do if (interesting(...)) ... any more, but want to assign the return value to a local variable. The name of choice for that variable would normally be "interesting", so I just wanted to make the function name be more specific, and avoid that whole issue (even though I then didn't choose that name for either of the users, just to avoid confusion in the patch itself ;) In other words, this doesn't really change anything, but I think it's a good thing to do, and if somebody comes along and writes the logic for "yeah, none of the pathspecs you have are interesting", we now support that trivially. It could easily be a meaningful optimization for things like "blame", where there's just one pathspec, and stopping when you've seen it would allow you to avoid about 50% of the tree traversals on average. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 23:18:30 +01:00			`if (!t2->size)`
			`break;`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, "+", t2, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`update_tree_entry(t2);`
			`continue;`
			`}`
			`if (!t2->size) {`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`show_entry(opt, "-", t1, base, baselen);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`update_tree_entry(t1);`
			`continue;`
			`}`
Avoid unnecessary strlen() calls This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-18 04:06:24 +01:00			`switch (compare_tree_entry(t1, t2, base, baselen, opt)) {`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`case -1:`
			`update_tree_entry(t1);`
			`continue;`
			`case 0:`
			`update_tree_entry(t1);`
			`/* Fallthrough */`
			`case 1:`
			`update_tree_entry(t2);`
			`continue;`
			`}`
'git foo' program identifies itself without dash in die() messages This is a mechanical conversion of all '*.c' files with: s/((?:die\|error\|warning)\("git)-(\S+:)/$1 $2/; The result was manually inspected and no false positive was found. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-31 18:39:19 +02:00			`die("git diff-tree: internal error");`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`
			`return 0;`
			`}`

Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`/*`
			`* Does it look like the resulting diff might be due to a rename?`
			`* - single entry`
			`* - not a valid previous file`
			`*/`
			`static inline int diff_might_be_rename(void)`
			`{`
			`return diff_queued_diff.nr == 1 &&`
			`!DIFF_FILE_VALID(diff_queued_diff.queue[0]->one);`
			`}`

			`static void try_to_follow_renames(struct tree_desc t1, struct tree_desc t2, const char base, struct diff_options opt)`
			`{`
			`struct diff_options diff_opts;`
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`struct diff_queue_struct *q = &diff_queued_diff;`
			`struct diff_filepair *choice;`
			`const char *paths[1];`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`int i;`

Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`/* Remove the file creation entry from the diff queue, and remember it */`
			`choice = q->queue[0];`
			`q->nr = 0;`

Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`diff_setup(&diff_opts);`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`DIFF_OPT_SET(&diff_opts, RECURSIVE);`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`diff_opts.detect_rename = DIFF_DETECT_RENAME;`
			`diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;`
			`diff_opts.single_follow = opt->paths[0];`
Fix diffcore-break total breakage Ok, so on the kernel list, some people noticed that "git log --follow" doesn't work too well with some files in the x86 merge, because a lot of files got renamed in very special ways. In particular, there was a pattern of doing single commits with renames that looked basically like - rename "filename.h" -> "filename_64.h" - create new "filename.c" that includes "filename_32.h" or "filename_64.h" depending on whether we're 32-bit or 64-bit. which was preparatory for smushing the two trees together. Now, there's two issues here: - "filename.c" remained. Yes, it was a rename, but there was a new file created with the old name in the same commit. This was important, because we wanted each commit to compile properly, so that it was bisectable, so splitting the rename into one commit and the "create helper file" into another was not an option. So we need to break associations where the contents change too much. Fine. We have the -B flag for that. When we break things up, then the rename detection will be able to figure out whether there are better alternatives. - "git log --follow" didn't with with -B. Now, the second case was really simple: we use a different "diffopt" structure for the rename detection than the basic one (which we use for showing the diffs). So that second case is trivially fixed by a trivial one-liner that just copies the break_opt values from the "real" diffopts to the one used for rename following. So now "git log -B --follow" works fine: diff --git a/tree-diff.c b/tree-diff.c index 26bdbdd..7c261fd 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc t1, struct tree_desc t2, co diff_opts.detect_rename = DIFF_DETECT_RENAME; diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT; diff_opts.single_follow = opt->paths[0]; + diff_opts.break_opt = opt->break_opt; paths[0] = NULL; diff_tree_setup_paths(paths, &diff_opts); if (diff_setup_done(&diff_opts) < 0) however, the end result does not work. Because our diffcore-break.c logic is totally bogus! In particular: - it used to do if (base_size < MINIMUM_BREAK_SIZE) return 0; /* we do not break too small filepair / which basically says "don't bother to break small files". But that "base_size" is the smaller* of the two sizes, which means that if some large file was rewritten into one that just includes another file, we would look at the (small) result, and decide that it's smaller than the break size, so it cannot be worth it to break it up! Even if the other side was ten times bigger and looked nothing like the samell file! That's clearly bogus. I replaced "base_size" with "max_size", so that we compare the bigger of the filepair with the break size. - It calculated a "merge_score", which was the score needed to merge it back together if nothing else wanted it. But even if it was so different that we would never want to merge it back, we wouldn't consider it a break! That makes no sense. So I added if (merge_score_p > break_score) return 1; to make it clear that if we wouldn't want to merge it at the end, it was definitely* a break. - It compared the whole "extent of damage", counting all inserts and deletes, but it based this score on the "base_size", and generated the damage score with delta_size = src_removed + literal_added; damage_score = delta_size * MAX_SCORE / base_size; but that makes no sense either, since quite often, this will result in a number that is bigger than MAX_SCORE! Why? Because base_size is (again) the smaller of the two files we compare, and when you start out from a small file and add a lot (or start out from a large file and remove a lot), the base_size is going to be much smaller than the damage! Again, the fix was to replace "base_size" with "max_size", at which point the damage actually becomes a sane percentage of the whole. With these changes in place, not only does "git log -B --follow" work for the case that triggered this in the first place, ie now git log -B --follow arch/x86/kernel/vmlinux_64.lds.S actually gives reasonable results. But I also wanted to verify it in general, by doing a full-history git log --stat -B -C on my kernel tree with the old code and the new code. There's some tweaking to be done, but generally, the new code generates much better results wrt breaking up files (and then finding better rename candidates). Here's a few examples of the "--stat" output: - This: include/asm-x86/Kbuild \| 2 - include/asm-x86/debugreg.h \| 79 +++++++++++++++++++++++++++++++++++------ include/asm-x86/debugreg_32.h \| 64 --------------------------------- include/asm-x86/debugreg_64.h \| 65 --------------------------------- 4 files changed, 68 insertions(+), 142 deletions(-) Becomes: include/asm-x86/Kbuild \| 2 - include/asm-x86/{debugreg_64.h => debugreg.h} \| 9 +++- include/asm-x86/debugreg_32.h \| 64 ------------------------- 3 files changed, 7 insertions(+), 68 deletions(-) - This: include/asm-x86/bug.h \| 41 +++++++++++++++++++++++++++++++++++++++-- include/asm-x86/bug_32.h \| 37 ------------------------------------- include/asm-x86/bug_64.h \| 34 ---------------------------------- 3 files changed, 39 insertions(+), 73 deletions(-) Becomes include/asm-x86/{bug_64.h => bug.h} \| 20 +++++++++++++----- include/asm-x86/bug_32.h \| 37 ----------------------------------- 2 files changed, 14 insertions(+), 43 deletions(-) Now, in some other cases, it does actually turn a rename into a real "delete+create" pair, and then the diff is usually bigger, so truth in advertizing: it doesn't always generate a nicer diff. But for what -B was meant for, I think this is a big improvement, and I suspect those cases where it generates a bigger diff are tweakable. So I think this diff fixes a real bug, but we might still want to tweak the default values and perhaps the exact rules for when a break happens. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-20 21:31:31 +02:00			`diff_opts.break_opt = opt->break_opt;`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`paths[0] = NULL;`
			`diff_tree_setup_paths(paths, &diff_opts);`
			`if (diff_setup_done(&diff_opts) < 0)`
			`die("unable to set up diff options to follow renames");`
			`diff_tree(t1, t2, base, &diff_opts);`
			`diffcore_std(&diff_opts);`
Fix small memory leaks induced by diff_tree_setup_paths Run diff_tree_release_paths in the appropriate places, and add a test to avoid NULL dereference. Better safe than sorry. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-11 22:59:55 +01:00			`diff_tree_release_paths(&diff_opts);`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`/* Go through the new set of filepairing, and see if we find a more interesting one */`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00
			`/*`
			`* Found a source? Not only do we use that for the new`
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`* diff_queued_diff, we will also use that as the path in`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`* the future!`
			`*/`
			`if ((p->status == 'R' \|\| p->status == 'C') && !strcmp(p->two->path, opt->paths[0])) {`
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`/* Switch the file-pairs around */`
			`q->queue[i] = choice;`
			`choice = p;`

			`/* Update the path we use from now on.. */`
Fix small memory leaks induced by diff_tree_setup_paths Run diff_tree_release_paths in the appropriate places, and add a test to avoid NULL dereference. Better safe than sorry. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-11 22:59:55 +01:00			`diff_tree_release_paths(opt);`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`opt->paths[0] = xstrdup(p->one->path);`
			`diff_tree_setup_paths(opt->paths, opt);`
			`break;`
			`}`
			`}`

			`/*`
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`* Then, discard all the non-relevane file pairs...`
			`*/`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
			`diff_free_filepair(p);`
			`}`

			`/*`
			`* .. and re-instate the one we want (which might be either the`
			`* original one, or the rename/copy we found)`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`*/`
Fix up "git log --follow" a bit.. This fixes "git log --follow" to hopefully not leak memory any more, and also cleans it up a bit to look more like some of the other functions that use "diff_queued_diff" (by not using it directly as a global in the code, but by instead just taking a pointer to the diff queue and using that). As to "diff_queued_diff", I think it would be better off not as a global at all, but as being just an entry in the "struct diff_options" structure, but that's a separate issue, and there may be some subtle reason for why it's currently a global. Anyway, no real changes. Instead of having a magical first entry in the diff-queue, we now end up just keeping the diff-queue clean, and keeping our "preferred" file pairing in an internal "choice" variable. That makes it easy to switch the choice around when we find a better one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-21 19:22:59 +02:00			`q->queue[0] = choice;`
			`q->nr = 1;`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`}`

Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`int diff_tree_sha1(const unsigned char old, const unsigned char new, const char base, struct diff_options opt)`
			`{`
			`void tree1, tree2;`
			`struct tree_desc t1, t2;`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`unsigned long size1, size2;`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`int retval;`

Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`tree1 = read_object_with_reference(old, tree_type, &size1, NULL);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`if (!tree1)`
			`die("unable to read source tree (%s)", sha1_to_hex(old));`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`tree2 = read_object_with_reference(new, tree_type, &size2, NULL);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`if (!tree2)`
			`die("unable to read destination tree (%s)", sha1_to_hex(new));`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&t1, tree1, size1);`
			`init_tree_desc(&t2, tree2, size2);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`retval = diff_tree(&t1, &t2, base, opt);`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (DIFF_OPT_TST(opt, FOLLOW_RENAMES) && diff_might_be_rename()) {`
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show that particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is not carrying around a "commit,pathname" kind of pair and it's not going to be able to notice the file coming from multiple different files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-19 23:22:46 +02:00			`init_tree_desc(&t1, tree1, size1);`
			`init_tree_desc(&t2, tree2, size2);`
			`try_to_follow_renames(&t1, &t2, base, opt);`
			`}`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`free(tree1);`
			`free(tree2);`
			`return retval;`
			`}`

Make git-cherry handle root trees This patch on top of 'next' makes built-in git-cherry handle root commits. It moves the static function log-tree.c::diff_root_tree() to tree-diff.c and makes it more similar to diff_tree_sha1() by shuffling around arguments and factoring out the call to log_tree_diff_flush(). Consequently the name is changed to diff_root_tree_sha1(). It is a version of diff_tree_sha1() that compares the empty tree (= root tree) against a single 'real' tree. This function is then used in get_patch_id() to compute patch IDs for initial commits instead of SEGFAULTing, as the current code does if confronted with parentless commits. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 18:52:39 +02:00			`int diff_root_tree_sha1(const unsigned char new, const char base, struct diff_options *opt)`
			`{`
			`int retval;`
			`void *tree;`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`unsigned long size;`
Make git-cherry handle root trees This patch on top of 'next' makes built-in git-cherry handle root commits. It moves the static function log-tree.c::diff_root_tree() to tree-diff.c and makes it more similar to diff_tree_sha1() by shuffling around arguments and factoring out the call to log_tree_diff_flush(). Consequently the name is changed to diff_root_tree_sha1(). It is a version of diff_tree_sha1() that compares the empty tree (= root tree) against a single 'real' tree. This function is then used in get_patch_id() to compute patch IDs for initial commits instead of SEGFAULTing, as the current code does if confronted with parentless commits. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 18:52:39 +02:00			`struct tree_desc empty, real;`

Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`tree = read_object_with_reference(new, tree_type, &size, NULL);`
Make git-cherry handle root trees This patch on top of 'next' makes built-in git-cherry handle root commits. It moves the static function log-tree.c::diff_root_tree() to tree-diff.c and makes it more similar to diff_tree_sha1() by shuffling around arguments and factoring out the call to log_tree_diff_flush(). Consequently the name is changed to diff_root_tree_sha1(). It is a version of diff_tree_sha1() that compares the empty tree (= root tree) against a single 'real' tree. This function is then used in get_patch_id() to compute patch IDs for initial commits instead of SEGFAULTing, as the current code does if confronted with parentless commits. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 18:52:39 +02:00			`if (!tree)`
			`die("unable to read root tree (%s)", sha1_to_hex(new));`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&real, tree, size);`
Make git-cherry handle root trees This patch on top of 'next' makes built-in git-cherry handle root commits. It moves the static function log-tree.c::diff_root_tree() to tree-diff.c and makes it more similar to diff_tree_sha1() by shuffling around arguments and factoring out the call to log_tree_diff_flush(). Consequently the name is changed to diff_root_tree_sha1(). It is a version of diff_tree_sha1() that compares the empty tree (= root tree) against a single 'real' tree. This function is then used in get_patch_id() to compute patch IDs for initial commits instead of SEGFAULTing, as the current code does if confronted with parentless commits. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 18:52:39 +02:00
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(&empty, "", 0);`
Make git-cherry handle root trees This patch on top of 'next' makes built-in git-cherry handle root commits. It moves the static function log-tree.c::diff_root_tree() to tree-diff.c and makes it more similar to diff_tree_sha1() by shuffling around arguments and factoring out the call to log_tree_diff_flush(). Consequently the name is changed to diff_root_tree_sha1(). It is a version of diff_tree_sha1() that compares the empty tree (= root tree) against a single 'real' tree. This function is then used in get_patch_id() to compute patch IDs for initial commits instead of SEGFAULTing, as the current code does if confronted with parentless commits. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-26 18:52:39 +02:00			`retval = diff_tree(&empty, &real, base, opt);`
			`free(tree);`
			`return retval;`
			`}`

Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`static int count_paths(const char **paths)`
			`{`
			`int i = 0;`
			`while (*paths++)`
			`i++;`
			`return i;`
			`}`

tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`void diff_tree_release_paths(struct diff_options *opt)`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`{`
tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`free(opt->pathlens);`
			`}`

			`void diff_tree_setup_paths(const char *p, struct diff_options opt)`
			`{`
			`opt->nr_paths = 0;`
			`opt->pathlens = NULL;`
			`opt->paths = NULL;`

Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`if (p) {`
			`int i;`

tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`opt->paths = p;`
			`opt->nr_paths = count_paths(p);`
			`if (opt->nr_paths == 0) {`
			`opt->pathlens = NULL;`
avoid asking ?alloc() for zero bytes. Avoid asking for zero bytes when that change simplifies overall logic. Later we would change the wrapper to ask for 1 byte on platforms that return NULL for zero byte request. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-26 21:34:56 +01:00			`return;`
			`}`
tree-diff: do not assume we use only one pathspec The way tree-diff was set up assumed we would use only one set of pathspec during the entire life of the program. Move the pathspec related static variables out to diff_options structure so that we can filter commits with one set of paths while show the actual diffs using different set of paths. I suspect this breaks blame.c, and makes "git log paths..." to default to the --full-diff, the latter of which is dealt with the next commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-11 01:39:11 +02:00			`opt->pathlens = xmalloc(opt->nr_paths * sizeof(int));`
			`for (i=0; i < opt->nr_paths; i++)`
			`opt->pathlens[i] = strlen(p[i]);`
Split up tree diff functions into tree-diff.c library This makes the tree diff functionality independent of the "git-diff-tree" program, by splitting the core functionality up into a library file. This will be needed for when we teach git-rev-list to only follow a specified set of pathnames, rather than the global revision history. Most of it is a fairly straightforward code move, but it also involves some calling convention cleanup, and moving some of the static variables from diff-tree.c into the options structure. The actual tree change callback routines also become paramterized by the diff_options structure, allowing the library functionality to do something else than just show the diff on stdout. Right now the only user of this functionality remains git-diff-tree itself. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-21 06:05:05 +02:00			`}`
			`}`