mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-15 05:33:04 +01:00

860 lines

22 KiB

C

Raw Normal View History

tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`#include "cache.h"`
			`#include "tree-walk.h"`
unpack_trees: group error messages by type When an error is encountered, it calls add_rejected_file() which either - directly displays the error message and stops if in plumbing mode (i.e. if show_all_errors is not initialized at 1) - or stores it so that it will be displayed at the end with display_error_msgs(), Storing the files by error type permits to have a list of files for which there is the same error instead of having a serie of almost identical errors. As each bind_overlap error combines a file and an old file, a list cannot be done, therefore, theses errors are not stored but directly displayed. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-11 10:38:07 +02:00			`#include "unpack-trees.h"`
tree_entry_interesting(): support depth limit This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:44 +01:00			`#include "dir.h"`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`#include "tree.h"`
move struct pathspec and related functions to pathspec.[ch] Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:25 +02:00			`#include "pathspec.h"`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`static const char get_mode(const char str, unsigned int *modep)`
			`{`
			`unsigned char c;`
			`unsigned int mode = 0;`

tree-walk: don't parse incorrect entries The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' \|\| c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-06 18:21:10 +01:00			`if (*str == ' ')`
			`return NULL;`

Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`while ((c = *str++) != ' ') {`
			`if (c < '0' \|\| c > '7')`
			`return NULL;`
			`mode = (mode << 3) + (c - '0');`
			`}`
			`*modep = mode;`
			`return str;`
			`}`

tree-walk: don't parse incorrect entries The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' \|\| c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-06 18:21:10 +01:00			`static void decode_tree_entry(struct tree_desc desc, const char buf, unsigned long size)`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`{`
			`const char *path;`
			`unsigned int mode, len;`

tree-walk: don't parse incorrect entries The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' \|\| c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-06 18:21:10 +01:00			`if (size < 24 \|\| buf[size - 21])`
			`die("corrupt tree file");`

Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`path = get_mode(buf, &mode);`
tree-walk: don't parse incorrect entries The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' \|\| c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-06 18:21:10 +01:00			`if (!path \|\| !*path)`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`die("corrupt tree file");`
			`len = strlen(path) + 1;`

			`/* Initialize the descriptor entry */`
			`desc->entry.path = path;`
tree-walk: finally switch over tree descriptors to contain a pre-parsed entry This continues 4651ece8 (Switch over tree descriptors to contain a pre-parsed entry) and moves the only rest computational part mode = canon_mode(mode) from tree_entry_extract() to tree entry decode phase - to decode_tree_entry(). The reason to do it, is that canon_mode() is at least 2 conditional jumps for regular files, and that could be noticeable should canon_mode() be invoked several times. That does not matter for current Git codebase, where typical tree traversal is while (t->size) { sha1 = tree_entry_extract(t, &path, &mode); ... update_tree_entry(t); } i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such cases, it does not matter performance-wise, where that mode canonicalization is done - either once in tree_entry_extract(), or once in decode_tree_entry() called by update_tree_entry() - it is approximately the same. But for future code, which could need to work with several tree_desc's in parallel, it could be handy to operate on tree_desc descriptors, and do "extracts" only when needed, or at all, access only relevant part of it through structure fields directly. And for such situations, having canon_mode() be done once in decode phase is better - we won't need to pay the performance price of 2 extra conditional jumps on every t->mode access. So let's move mode canonicalization to decode_tree_entry(). That was the final bit. Now after tree entry is decoded, it is fully ready and could be accessed either directly via field, or through tree_entry_extract() which this time got really "totally trivial". Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-02-06 12:36:31 +01:00			`desc->entry.mode = canon_mode(mode);`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`desc->entry.sha1 = (const unsigned char *)(path + len);`
			`}`

Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`void init_tree_desc(struct tree_desc desc, const void buffer, unsigned long size)`
			`{`
			`desc->buffer = buffer;`
			`desc->size = size;`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`if (size)`
			`decode_tree_entry(desc, buffer, size);`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`}`

tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`void fill_tree_descriptor(struct tree_desc desc, const unsigned char *sha1)`
			`{`
			`unsigned long size = 0;`
			`void *buf = NULL;`

			`if (sha1) {`
Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-02 14:44:09 +02:00			`buf = read_object_with_reference(sha1, tree_type, &size, NULL);`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`if (!buf)`
			`die("unable to read tree %s", sha1_to_hex(sha1));`
			`}`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`init_tree_desc(desc, buf, size);`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`return buf;`
			`}`

			`static void entry_clear(struct name_entry *a)`
			`{`
			`memset(a, 0, sizeof(*a));`
			`}`

			`static void entry_extract(struct tree_desc t, struct name_entry a)`
			`{`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`*a = t->entry;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`

			`void update_tree_entry(struct tree_desc *desc)`
			`{`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`const void *buf = desc->buffer;`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`const unsigned char *end = desc->entry.sha1 + 20;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`unsigned long size = desc->size;`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`unsigned long len = end - (const unsigned char *)buf;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00
			`if (size < len)`
			`die("corrupt tree file");`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`buf = end;`
			`size -= len;`
			`desc->buffer = buf;`
			`desc->size = size;`
			`if (size)`
			`decode_tree_entry(desc, buf, size);`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`

tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`int tree_entry(struct tree_desc desc, struct name_entry entry)`
			`{`
Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`if (!desc->size)`
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`return 0;`

Switch over tree descriptors to contain a pre-parsed entry This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:09:56 +01:00			`*entry = desc->entry;`
			`update_tree_entry(desc);`
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-30 18:45:45 +02:00			`return 1;`
			`}`

Make 'traverse_tree()' use linked structure rather than 'const char *base' This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:59:29 +01:00			`void setup_traverse_info(struct traverse_info info, const char base)`
			`{`
			`int pathlen = strlen(base);`
Fix tree-walking compare_entry() in the presense of --prefix When we make the "root" tree-walk info entry have a pathname in it, we need to have a ->prev pointer so that compare_entry will actually notice and traverse into the root. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-07 00:44:48 +01:00			`static struct traverse_info dummy;`
Make 'traverse_tree()' use linked structure rather than 'const char *base' This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:59:29 +01:00
			`memset(info, 0, sizeof(*info));`
			`if (pathlen && base[pathlen-1] == '/')`
			`pathlen--;`
			`info->pathlen = pathlen ? pathlen + 1 : 0;`
			`info->name.path = base;`
			`info->name.sha1 = (void *)(base + pathlen + 1);`
Fix tree-walking compare_entry() in the presense of --prefix When we make the "root" tree-walk info entry have a pathname in it, we need to have a ->prev pointer so that compare_entry will actually notice and traverse into the root. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-07 00:44:48 +01:00			`if (pathlen)`
			`info->prev = &dummy;`
Make 'traverse_tree()' use linked structure rather than 'const char *base' This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:59:29 +01:00			`}`

			`char make_traverse_path(char path, const struct traverse_info info, const struct name_entry n)`
			`{`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`int len = tree_entry_len(n);`
Make 'traverse_tree()' use linked structure rather than 'const char *base' This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:59:29 +01:00			`int pathlen = info->pathlen;`

			`path[pathlen + len] = 0;`
			`for (;;) {`
			`memcpy(path + pathlen, n->path, len);`
			`if (!pathlen)`
			`break;`
			`path[--pathlen] = '/';`
			`n = &info->name;`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`len = tree_entry_len(n);`
Make 'traverse_tree()' use linked structure rather than 'const char *base' This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:59:29 +01:00			`info = info->prev;`
			`pathlen -= len;`
			`}`
			`return path;`
			`}`

traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`struct tree_desc_skip {`
			`struct tree_desc_skip *prev;`
			`const void *ptr;`
			`};`

			`struct tree_desc_x {`
			`struct tree_desc d;`
			`struct tree_desc_skip *skip;`
			`};`

			`static int name_compare(const char *a, int a_len,`
			`const char *b, int b_len)`
			`{`
			`int len = (a_len < b_len) ? a_len : b_len;`
			`int cmp = memcmp(a, b, len);`
			`if (cmp)`
			`return cmp;`
			`return (a_len - b_len);`
			`}`

			`static int check_entry_match(const char a, int a_len, const char b, int b_len)`
			`{`
			`/*`
			`* The caller wants to pick a from a tree or nothing.`
			`* We are looking at b in a tree.`
			`*`
			`* (0) If a and b are the same name, we are trivially happy.`
			`*`
			`* There are three possibilities where a could be hiding`
			`* behind b.`
			`*`
			`* (1) a == "t", b == "ab" i.e. b sorts earlier than a no`
			`* matter what.`
			`* (2) a == "t", b == "t-2" and "t" is a subtree in the tree;`
			`* (3) a == "t-2", b == "t" and "t-2" is a blob in the tree.`
			`*`
			`* Otherwise we know a won't appear in the tree without`
			`* scanning further.`
			`*/`

			`int cmp = name_compare(a, a_len, b, b_len);`

			`/* Most common case first -- reading sync'd trees */`
			`if (!cmp)`
			`return cmp;`

			`if (0 < cmp) {`
			`/* a comes after b; it does not matter if it is case (3)`
			`if (b_len < a_len && !memcmp(a, b, b_len) && a[b_len] < '/')`
			`return 1;`
			`*/`
			`return 1; /* keep looking */`
			`}`

			`/* b comes after a; are we looking at case (2)? */`
			`if (a_len < b_len && !memcmp(a, b, a_len) && b[a_len] < '/')`
			`return 1; /* keep looking */`

			`return -1; /* a cannot appear in the tree */`
			`}`

			`/*`
			`* From the extended tree_desc, extract the first name entry, while`
			`* paying attention to the candidate "first" name. Most importantly,`
			`* when looking for an entry, if there are entries that sorts earlier`
			`* in the tree object representation than that name, skip them and`
			`* process the named entry first. We will remember that we haven't`
			`* processed the first entry yet, and in the later call skip the`
			`* entry we processed early when update_extended_entry() is called.`
			`*`
			`* E.g. if the underlying tree object has these entries:`
			`*`
			`* blob "t-1"`
			`* blob "t-2"`
			`* tree "t"`
			`* blob "t=1"`
			`*`
			`* and the "first" asks for "t", remember that we still need to`
			`* process "t-1" and "t-2" but extract "t". After processing the`
			`* entry "t" from this call, the caller will let us know by calling`
			`* update_extended_entry() that we can remember "t" has been processed`
			`* already.`
			`*/`

			`static void extended_entry_extract(struct tree_desc_x *t,`
			`struct name_entry *a,`
			`const char *first,`
			`int first_len)`
			`{`
			`const char *path;`
			`int len;`
			`struct tree_desc probe;`
			`struct tree_desc_skip *skip;`

			`/*`
			`* Extract the first entry from the tree_desc, but skip the`
			`* ones that we already returned in earlier rounds.`
			`*/`
			`while (1) {`
			`if (!t->d.size) {`
			`entry_clear(a);`
			`break; /* not found */`
			`}`
			`entry_extract(&t->d, a);`
			`for (skip = t->skip; skip; skip = skip->prev)`
			`if (a->path == skip->ptr)`
			`break; /* found */`
			`if (!skip)`
			`break;`
			`/* We have processed this entry already. */`
			`update_tree_entry(&t->d);`
			`}`

			`if (!first \|\| !a->path)`
			`return;`

			`/*`
			`* The caller wants "first" from this tree, or nothing.`
			`*/`
			`path = a->path;`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`len = tree_entry_len(a);`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`switch (check_entry_match(first, first_len, path, len)) {`
			`case -1:`
			`entry_clear(a);`
			`case 0:`
			`return;`
			`default:`
			`break;`
			`}`

			`/*`
			`* We need to look-ahead -- we suspect that a subtree whose`
			`* name is "first" may be hiding behind the current entry "path".`
			`*/`
			`probe = t->d;`
			`while (probe.size) {`
			`entry_extract(&probe, a);`
			`path = a->path;`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`len = tree_entry_len(a);`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`switch (check_entry_match(first, first_len, path, len)) {`
			`case -1:`
			`entry_clear(a);`
			`case 0:`
			`return;`
			`default:`
			`update_tree_entry(&probe);`
			`break;`
			`}`
			`/* keep looking */`
			`}`
			`entry_clear(a);`
			`}`

			`static void update_extended_entry(struct tree_desc_x t, struct name_entry a)`
			`{`
			`if (t->d.entry.path == a->path) {`
			`update_tree_entry(&t->d);`
			`} else {`
			`/* we have returned this entry early */`
			`struct tree_desc_skip skip = xmalloc(sizeof(skip));`
			`skip->ptr = a->path;`
			`skip->prev = t->skip;`
			`t->skip = skip;`
			`}`
			`}`

			`static void free_extended_entry(struct tree_desc_x *t)`
			`{`
			`struct tree_desc_skip p, s;`

			`for (s = t->skip; s; s = p) {`
			`p = s->prev;`
			`free(s);`
			`}`
			`}`

traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`static inline int prune_traversal(struct name_entry *e,`
			`struct traverse_info *info,`
			`struct strbuf *base,`
			`int still_interesting)`
			`{`
			`if (!info->pathspec \|\| still_interesting == 2)`
			`return 2;`
			`if (still_interesting < 0)`
			`return still_interesting;`
			`return tree_entry_interesting(e, base, 0, info->pathspec);`
			`}`

Add return value to 'traverse_tree()' callback This allows the callback to return an error value, but it can also specify which of the tree entries that it actually used up by returning a positive mask value. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 04:44:06 +01:00			`int traverse_trees(int n, struct tree_desc t, struct traverse_info info)`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`{`
unpack_trees: group error messages by type When an error is encountered, it calls add_rejected_file() which either - directly displays the error message and stops if in plumbing mode (i.e. if show_all_errors is not initialized at 1) - or stores it so that it will be displayed at the end with display_error_msgs(), Storing the files by error type permits to have a list of files for which there is the same error instead of having a serie of almost identical errors. As each bind_overlap error combines a file and an old file, a list cannot be done, therefore, theses errors are not stored but directly displayed. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-11 10:38:07 +02:00			`int error = 0;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`struct name_entry entry = xmalloc(nsizeof(*entry));`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`int i;`
			`struct tree_desc_x tx = xcalloc(n, sizeof(tx));`
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`struct strbuf base = STRBUF_INIT;`
			`int interesting = 1;`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00
			`for (i = 0; i < n; i++)`
			`tx[i].d = t[i];`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`if (info->prev) {`
			`strbuf_grow(&base, info->pathlen);`
			`make_traverse_path(base.buf, info->prev, &info->name);`
			`base.buf[info->pathlen-1] = '/';`
			`strbuf_setlen(&base, info->pathlen);`
			`}`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`for (;;) {`
traverse_trees(): clarify return value of the callback The variable name "ret" sounds like the variable to be returned, but since e6c111b4 we return error, and it is misleading. As this variable tells us which trees in t[] array were used in the callback function, so that this caller can know the entries in which of the trees need advancing, "trees_used" is a better name. Also the assignment to 0 was removed at the start of the function as well after the "if (interesting)" block. Those are unneeded as that variable is set to the callback return value any time we enter the "if (interesting)" block, so we'd overwrite old values anyway. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-19 22:26:32 +02:00			`int trees_used;`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`unsigned long mask, dirmask;`
			`const char *first = NULL;`
			`int first_len = 0;`
Fix some "variable might be used uninitialized" warnings In particular, gcc complains as follows: CC tree-walk.o tree-walk.c: In function `traverse_trees': tree-walk.c:347: warning: 'e' might be used uninitialized in this \ function CC builtin/revert.o builtin/revert.c: In function `verify_opt_mutually_compatible': builtin/revert.c:113: warning: 'opt2' might be used uninitialized in \ this function Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-11 21:39:32 +02:00			`struct name_entry *e = NULL;`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`int len;`

			`for (i = 0; i < n; i++) {`
			`e = entry + i;`
			`extended_entry_extract(tx + i, e, NULL, 0);`
			`}`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`/*`
			`* A tree may have "t-2" at the current location even`
			`* though it may have "t" that is a subtree behind it,`
			`* and another tree may return "t". We want to grab`
			`* all "t" from all trees to match in such a case.`
			`*/`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`for (i = 0; i < n; i++) {`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`e = entry + i;`
			`if (!e->path)`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`continue;`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`len = tree_entry_len(e);`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`if (!first) {`
			`first = e->path;`
			`first_len = len;`
			`continue;`
			`}`
			`if (name_compare(e->path, len, first, first_len) < 0) {`
			`first = e->path;`
			`first_len = len;`
			`}`
			`}`

			`if (first) {`
			`for (i = 0; i < n; i++) {`
			`e = entry + i;`
			`extended_entry_extract(tx + i, e, first, first_len);`
			`/* Cull the ones that are not the earliest */`
			`if (!e->path)`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`continue;`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`len = tree_entry_len(e);`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`if (name_compare(e->path, len, first, first_len))`
			`entry_clear(e);`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`}`

			`/* Now we have in entry[i] the earliest name from the trees */`
			`mask = 0;`
			`dirmask = 0;`
			`for (i = 0; i < n; i++) {`
			`if (!entry[i].path)`
			`continue;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`mask \|= 1ul << i;`
Make 'traverse_trees()' traverse conflicting DF entries in parallel This makes the traverse_trees() entry comparator routine use the more relaxed form of name comparison that considers files and directories with the same name identical. We pass in a separate mask for just the directory entries, so that the callback routine can decide (if it wants to) to only handle one or the other type, but generally most (all?) users are expected to really want to see the case of a name 'foo' showing up in one tree as a file and in another as a directory at the same time. In particular, moving 'unpack_trees()' over to use this tree traversal mechanism requires this. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 05:06:18 +01:00			`if (S_ISDIR(entry[i].mode))`
			`dirmask \|= 1ul << i;`
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`e = &entry[i];`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`
			`if (!mask)`
			`break;`
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`interesting = prune_traversal(e, info, &base, interesting);`
			`if (interesting < 0)`
			`break;`
			`if (interesting) {`
traverse_trees(): clarify return value of the callback The variable name "ret" sounds like the variable to be returned, but since e6c111b4 we return error, and it is misleading. As this variable tells us which trees in t[] array were used in the callback function, so that this caller can know the entries in which of the trees need advancing, "trees_used" is a better name. Also the assignment to 0 was removed at the start of the function as well after the "if (interesting)" block. Those are unneeded as that variable is set to the callback return value any time we enter the "if (interesting)" block, so we'd overwrite old values anyway. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-19 22:26:32 +02:00			`trees_used = info->fn(n, mask, dirmask, entry, info);`
			`if (trees_used < 0) {`
			`error = trees_used;`
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`if (!info->show_all_errors)`
			`break;`
			`}`
traverse_trees(): clarify return value of the callback The variable name "ret" sounds like the variable to be returned, but since e6c111b4 we return error, and it is misleading. As this variable tells us which trees in t[] array were used in the callback function, so that this caller can know the entries in which of the trees need advancing, "trees_used" is a better name. Also the assignment to 0 was removed at the start of the function as well after the "if (interesting)" block. Those are unneeded as that variable is set to the callback return value any time we enter the "if (interesting)" block, so we'd overwrite old values anyway. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-19 22:26:32 +02:00			`mask &= trees_used;`
unpack_trees: group error messages by type When an error is encountered, it calls add_rejected_file() which either - directly displays the error message and stops if in plumbing mode (i.e. if show_all_errors is not initialized at 1) - or stores it so that it will be displayed at the end with display_error_msgs(), Storing the files by error type permits to have a list of files for which there is the same error instead of having a serie of almost identical errors. As each bind_overlap error combines a file and an old file, a list cannot be done, therefore, theses errors are not stored but directly displayed. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-11 10:38:07 +02:00			`}`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`for (i = 0; i < n; i++)`
Add return value to 'traverse_tree()' callback This allows the callback to return an error value, but it can also specify which of the tree entries that it actually used up by returning a positive mask value. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 04:44:06 +01:00			`if (mask & (1ul << i))`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`update_extended_entry(tx + i, entry + i);`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`
			`free(entry);`
traverse_trees(): handle D/F conflict case sanely traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-19 23:07:14 +02:00			`for (i = 0; i < n; i++)`
			`free_extended_entry(tx + i);`
			`free(tx);`
traverse_trees(): allow pruning with pathspec The traverse_trees() machinery is primarily meant for merging two (or more) trees, and because a merge is a full tree operation, it doesn't support any pruning with pathspec. Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19), however, we use unpack_trees() to traverse_trees() callchain to perform "diff-index", which could waste a lot of work traversing trees outside the user-supplied pathspec, only to discard at the blob comparison level in diff-lib.c::oneway_diff() which is way too late. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 21:26:05 +02:00			`strbuf_release(&base);`
unpack_trees: group error messages by type When an error is encountered, it calls add_rejected_file() which either - directly displays the error message and stops if in plumbing mode (i.e. if show_all_errors is not initialized at 1) - or stores it so that it will be displayed at the end with display_error_msgs(), Storing the files by error type permits to have a list of files for which there is the same error instead of having a serie of almost identical errors. As each bind_overlap error combines a file and an old file, a list cannot be done, therefore, theses errors are not stored but directly displayed. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-11 10:38:07 +02:00			`return error;`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`}`

get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`static int find_tree_entry(struct tree_desc t, const char name, unsigned char result, unsigned mode)`
			`{`
			`int namelen = strlen(name);`
			`while (t->size) {`
			`const char *entry;`
			`const unsigned char *sha1;`
			`int entrylen, cmp;`

			`sha1 = tree_entry_extract(t, &entry, mode);`
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`entrylen = tree_entry_len(&t->entry);`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`update_tree_entry(t);`
			`if (entrylen > namelen)`
			`continue;`
			`cmp = memcmp(name, entry, entrylen);`
			`if (cmp > 0)`
			`continue;`
			`if (cmp < 0)`
			`break;`
			`if (entrylen == namelen) {`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(result, sha1);`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`return 0;`
			`}`
			`if (name[entrylen] != '/')`
			`continue;`
			`if (!S_ISDIR(*mode))`
			`break;`
			`if (++entrylen == namelen) {`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(result, sha1);`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`return 0;`
			`}`
			`return get_tree_entry(sha1, name + entrylen, result, mode);`
			`}`
			`return -1;`
			`}`

			`int get_tree_entry(const unsigned char tree_sha1, const char name, unsigned char sha1, unsigned mode)`
			`{`
			`int retval;`
			`void *tree;`
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`unsigned long size;`
get_tree_entry: map blank requested entry to tree root This means that git show HEAD: will now return HEAD^{tree}, which is logically consistent with git show HEAD:Documentation Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-09 17:11:47 +01:00			`unsigned char root[20];`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00
Initialize tree descriptors with a helper function rather than by hand. This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 18:08:25 +01:00			`tree = read_object_with_reference(tree_sha1, tree_type, &size, root);`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`if (!tree)`
			`return -1;`
get_tree_entry: map blank requested entry to tree root This means that git show HEAD: will now return HEAD^{tree}, which is logically consistent with git show HEAD:Documentation Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-09 17:11:47 +01:00
			`if (name[0] == '\0') {`
			`hashcpy(sha1, root);`
fix minor memory leak in get_tree_entry() Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-02-14 10:56:46 +01:00			`free(tree);`
get_tree_entry: map blank requested entry to tree root This means that git show HEAD: will now return HEAD^{tree}, which is logically consistent with git show HEAD:Documentation Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-09 17:11:47 +01:00			`return 0;`
			`}`

get_tree_entry(): do not call find_tree_entry() on an empty tree We know we will find nothing. This incidentally squelches false warning from gcc about potentially uninitialized usage of t.entry fields. For an empty tree, it is true that init_tree_desc() does not call decode_tree_entry() and the tree_desc is left uninitialized, but find_tree_entry() only calls tree_entry_extract() that uses the tree_desc while it has more things to read from the tree, so the uninitialized t.entry fields are never used in such a case anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-27 20:18:40 +02:00			`if (!size) {`
			`retval = -1;`
			`} else {`
			`struct tree_desc t;`
			`init_tree_desc(&t, tree, size);`
			`retval = find_tree_entry(&t, name, sha1, mode);`
			`}`
get_tree_entry(): make it available from tree-walk Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-19 23:05:47 +02:00			`free(tree);`
			`return retval;`
			`}`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`static int match_entry(const struct pathspec_item *item,`
			`const struct name_entry *entry, int pathlen,`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`const char *match, int matchlen,`
tree-walk: use enum interesting instead of integer Commit d688cf0 (tree_entry_interesting(): give meaningful names to return values - 2011-10-24) converts most of the tree_entry_interesting values to the new enum, except "never_interesting". This completes the conversion. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-19 19:14:42 +02:00			`enum interesting *never_interesting)`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`{`
			`int m = -1; /* signals that we haven't called strncmp() */`

parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`if (item->magic & PATHSPEC_ICASE)`
			`/*`
			`* "Never interesting" trick requires exact`
			`* matching. We could do something clever with inexact`
			`* matching, but it's trickier (and not to forget that`
			`* strcasecmp is locale-dependent, at least in`
			`* glibc). Just disable it for now. It can't be worse`
			`* than the wildcard's codepath of '[Tt][Hi][Is][Ss]'`
			`* pattern.`
			`*/`
			`*never_interesting = entry_not_interesting;`
			`else if (*never_interesting != entry_not_interesting) {`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`/*`
			`* We have not seen any match that sorts later`
			`* than the current path.`
			`*/`

			`/*`
			`* Does match sort strictly earlier than path`
			`* with their common parts?`
			`*/`
			`m = strncmp(match, entry->path,`
			`(matchlen < pathlen) ? matchlen : pathlen);`
			`if (m < 0)`
			`return 0;`

			`/*`
			`* If we come here even once, that means there is at`
			`* least one pathspec that would sort equal to or`
			`* later than the path we are currently looking at.`
			`* In other words, if we have never reached this point`
			`* after iterating all pathspecs, it means all`
			`* pathspecs are either outside of base, or inside the`
			`* base but sorts strictly earlier than the current`
			`* one. In either case, they will never match the`
			`* subsequent entries. In such a case, we initialized`
			`* the variable to -1 and that is what will be`
			`* returned, allowing the caller to terminate early.`
			`*/`
tree-walk: use enum interesting instead of integer Commit d688cf0 (tree_entry_interesting(): give meaningful names to return values - 2011-10-24) converts most of the tree_entry_interesting values to the new enum, except "never_interesting". This completes the conversion. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-19 19:14:42 +02:00			`*never_interesting = entry_not_interesting;`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`}`

			`if (pathlen > matchlen)`
			`return 0;`

			`if (matchlen > pathlen) {`
			`if (match[pathlen] != '/')`
			`return 0;`
tree-walk.c: ignore trailing slash on submodule in tree_entry_interesting() We do ignore trailing slash on a directory, so pathspec "abc/" matches directory "abc". A submodule is also a directory. Apply the same logic to it. This makes "git log submodule-path" and "git log submodule-path/" produce the same output. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-01-23 14:22:05 +01:00			`if (!S_ISDIR(entry->mode) && !S_ISGITLINK(entry->mode))`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`return 0;`
			`}`

			`if (m == -1)`
			`/*`
			`* we cheated and did not do strncmp(), so we do`
			`* that here.`
			`*/`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`m = ps_strncmp(item, match, entry->path, pathlen);`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00
			`/*`
			`* If common part matched earlier then it is a hit,`
			`* because we rejected the case where path is not a`
			`* leading directory and is shorter than match.`
			`*/`
			`if (!m)`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`/*`
			`* match_entry does not check if the prefix part is`
			`* matched case-sensitively. If the entry is a`
			`* directory and part of prefix, it'll be rematched`
			`* eventually by basecmp with special treatment for`
			`* the prefix.`
			`*/`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`return 1;`

			`return 0;`
			`}`

parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`/* :(icase)-aware string compare */`
			`static int basecmp(const struct pathspec_item *item,`
			`const char base, const char match, int len)`
			`{`
			`if (item->magic & PATHSPEC_ICASE) {`
			`int ret, n = len > item->prefix ? item->prefix : len;`
			`ret = strncmp(base, match, n);`
			`if (ret)`
			`return ret;`
			`base += n;`
			`match += n;`
			`len -= n;`
			`}`
			`return ps_strncmp(item, base, match, len);`
			`}`

			`static int match_dir_prefix(const struct pathspec_item *item,`
			`const char *base,`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`const char *match, int matchlen)`
			`{`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`if (basecmp(item, base, match, matchlen))`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`return 0;`

			`/*`
			`* If the base is a subdirectory of a path which`
			`* was specified, all of them are interesting.`
			`*/`
			`if (!matchlen \|\|`
			`base[matchlen] == '/' \|\|`
			`match[matchlen - 1] == '/')`
			`return 1;`

			`/* Just a random prefix match */`
			`return 0;`
			`}`

tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`/*`
			`* Perform matching on the leading non-wildcard part of`
			`* pathspec. item->nowildcard_len must be greater than zero. Return`
			`* non-zero if base is matched.`
			`*/`
			`static int match_wildcard_base(const struct pathspec_item *item,`
			`const char *base, int baselen,`
			`int *matched)`
			`{`
			`const char *match = item->match;`
			`/* the wildcard part is not considered in this function */`
			`int matchlen = item->nowildcard_len;`

			`if (baselen) {`
			`int dirlen;`
			`/*`
			`* Return early if base is longer than the`
			`* non-wildcard part but it does not match.`
			`*/`
			`if (baselen >= matchlen) {`
			`*matched = matchlen;`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`return !basecmp(item, base, match, matchlen);`
tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`}`

			`dirlen = matchlen;`
			`while (dirlen && match[dirlen - 1] != '/')`
			`dirlen--;`

			`/*`
			`* Return early if base is shorter than the`
			`* non-wildcard part but it does not match. Note that`
			`* base ends with '/' so we are sure it really matches`
			`* directory`
			`*/`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`if (basecmp(item, base, match, baselen))`
tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`return 0;`
			`*matched = baselen;`
			`} else`
			`*matched = 0;`
			`/*`
			`* we could have checked entry against the non-wildcard part`
			`* that is not in base and does similar never_interesting`
			`* optimization as in match_entry. For now just be happy with`
			`* base comparison.`
			`*/`
			`return entry_interesting;`
			`}`

Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`/*`
			`* Is a tree entry interesting given the pathspec we have?`
			`*`
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`* Pre-condition: either baselen == base_offset (i.e. empty path)`
			`* or base[baselen-1] == '/' (i.e. with trailing slash).`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`*/`
Support pathspec magic :(exclude) and its short form :! Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-12-06 08:30:48 +01:00			`static enum interesting do_match(const struct name_entry *entry,`
			`struct strbuf *base, int base_offset,`
			`const struct pathspec *ps,`
			`int exclude)`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`{`
			`int i;`
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`int pathlen, baselen = base->len - base_offset;`
tree-walk: use enum interesting instead of integer Commit d688cf0 (tree_entry_interesting(): give meaningful names to return values - 2011-10-24) converts most of the tree_entry_interesting values to the new enum, except "never_interesting". This completes the conversion. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-19 19:14:42 +02:00			`enum interesting never_interesting = ps->has_wildcard ?`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`entry_not_interesting : all_entries_not_interesting;`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00
pathspec: support :(literal) syntax for noglob pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:06 +02:00			`GUARD_PATHSPEC(ps,`
			`PATHSPEC_FROMTOP \|`
			`PATHSPEC_MAXDEPTH \|`
pathspec: support :(glob) syntax :(glob)path differs from plain pathspec that it uses wildmatch with WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The difference lies in how '' (and '*') is processed. With the introduction of :(glob) and :(literal) and their global options --[no]glob-pathspecs, the user can: - make everything literal by default via --noglob-pathspecs --literal-pathspecs cannot be used for this purpose as it disables _all_ pathspec magic. - individually turn on globbing with :(glob) - make everything globbing by default via --glob-pathspecs - individually turn off globbing with :(literal) The implication behind this is, there is no way to gain the default matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get new globbing or literal. The old fnmatch behavior is considered deprecated and discouraged to use. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:08 +02:00			`PATHSPEC_LITERAL \|`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`PATHSPEC_GLOB \|`
Support pathspec magic :(exclude) and its short form :! Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-12-06 08:30:48 +01:00			`PATHSPEC_ICASE \|`
			`PATHSPEC_EXCLUDE);`
guard against new pathspec magic in pathspec matching code GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those that touch anything in 'struct pathspec' except fields "nr" and "original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to help the designers catch unsupported codepaths. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:36 +02:00
tree_entry_interesting(): support depth limit This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:44 +01:00			`if (!ps->nr) {`
parse_pathspec: add special flag for max_depth feature match_pathspec_depth() and tree_entry_interesting() check max_depth field in order to support "git grep --max-depth". The feature activation is tied to "recursive" field, which led to some unwanted activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec matching in unpack_trees - 2012-01-15). This patch decouples the activation from "recursive" field, puts it in "magic" field instead. This makes sure that only "git grep" can activate this feature. And because parse_pathspec knows when the feature is not used, it does not need to sort pathspec (required for max_depth to work correctly). A small win for non-grep cases. Even though a new magic flag is introduced, no magic syntax is. The magic can be only enabled by parse_pathspec() caller. We might someday want to support ":(maxdepth:10)src." It all depends on actual use cases. max_depth feature cannot be enabled via init_pathspec() anymore. But that's ok because init_pathspec() is on its way to /dev/null. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:32 +02:00			`if (!ps->recursive \|\|`
			`!(ps->magic & PATHSPEC_MAXDEPTH) \|\|`
			`ps->max_depth == -1)`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return all_entries_interesting;`
			`return within_depth(base->buf + base_offset, baselen,`
			`!!S_ISDIR(entry->mode),`
			`ps->max_depth) ?`
			`entry_interesting : entry_not_interesting;`
tree_entry_interesting(): support depth limit This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:44 +01:00			`}`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00
tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:09 +02:00			`pathlen = tree_entry_len(entry);`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`for (i = ps->nr - 1; i >= 0; i--) {`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`const struct pathspec_item *item = ps->items+i;`
			`const char *match = item->match;`
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`const char *base_str = base->buf + base_offset;`
tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`int matchlen = item->len, matched = 0;`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00
Support pathspec magic :(exclude) and its short form :! Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-12-06 08:30:48 +01:00			`if ((!exclude && item->magic & PATHSPEC_EXCLUDE) \|\|`
			`( exclude && !(item->magic & PATHSPEC_EXCLUDE)))`
			`continue;`

Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`if (baselen >= matchlen) {`
			`/* If it doesn't match, move along... */`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`if (!match_dir_prefix(item, base_str, match, matchlen))`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`goto match_wildcards;`
tree_entry_interesting(): support depth limit This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:44 +01:00
parse_pathspec: add special flag for max_depth feature match_pathspec_depth() and tree_entry_interesting() check max_depth field in order to support "git grep --max-depth". The feature activation is tied to "recursive" field, which led to some unwanted activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec matching in unpack_trees - 2012-01-15). This patch decouples the activation from "recursive" field, puts it in "magic" field instead. This makes sure that only "git grep" can activate this feature. And because parse_pathspec knows when the feature is not used, it does not need to sort pathspec (required for max_depth to work correctly). A small win for non-grep cases. Even though a new magic flag is introduced, no magic syntax is. The magic can be only enabled by parse_pathspec() caller. We might someday want to support ":(maxdepth:10)src." It all depends on actual use cases. max_depth feature cannot be enabled via init_pathspec() anymore. But that's ok because init_pathspec() is on its way to /dev/null. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:32 +02:00			`if (!ps->recursive \|\|`
			`!(ps->magic & PATHSPEC_MAXDEPTH) \|\|`
			`ps->max_depth == -1)`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return all_entries_interesting;`
tree_entry_interesting(): support depth limit This is needed to replace pathspec_matches() in builtin/grep.c. max_depth == -1 means infinite depth. Depth limit is only effective when pathspec.recursive == 1. When pathspec.recursive == 0, the behavior depends on match functions: non-recursive for tree_entry_interesting() and recursive for match_pathspec{,_depth} Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:44 +01:00
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return within_depth(base_str + matchlen + 1,`
			`baselen - matchlen - 1,`
			`!!S_ISDIR(entry->mode),`
			`ps->max_depth) ?`
			`entry_interesting : entry_not_interesting;`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`}`

tree-walk: micro-optimization in tree_entry_interesting In the case of a wide breadth top-level tree (~2400 entries, all trees in this case), we can see a noticeable cost in the profiler calling strncmp() here. Most of the time we are at the base level of the repository, so base is "" and baselen == 0, which means we will always test true. Break out this one tiny case so we can short circuit the strncmp() call. Test cases are as follows. packages.git is the Arch Linux git-svn clone of the packages repository which has the characteristics above. Commands: [1] packages.git, /usr/bin/time git log >/dev/null [2] packages.git, /usr/bin/time git log -- autogen/trunk pacman/trunk wget/trunk >/dev/null [3] linux.git, /usr/bin/time git log >/dev/null [4] linux.git, /usr/bin/time git log -- drivers/ata drivers/uio tools >/dev/null Results: before after %faster [1] 2.56 2.55 0.4% [2] 51.82 48.66 6.5% [3] 5.58 5.61 -0.5% [4] 1.55 1.51 0.2% The takeaway here is this doesn't matter in many operations, but it does for a certain style of repository and operation where it nets a 6.5% measured improvement. The other changes are likely not significant by reasonable statistics methods. Note: the measured improvement when originally submitted was ~11% (43 to 38 secs) for operation [2]. At the time, the repository had 117220 commits; it now has 137537 commits. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-09 04:02:46 +02:00			`/* Either there must be no base, or the base must match. */`
parse_pathspec: accept :(icase)path syntax Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:09 +02:00			`if (baselen == 0 \|\| !basecmp(item, base_str, match, baselen)) {`
			`if (match_entry(item, entry, pathlen,`
tree_entry_interesting(): refactor into separate smaller functions Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:43 +01:00			`match + baselen, matchlen - baselen,`
			`&never_interesting))`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return entry_interesting;`
tree_entry_interesting(): optimize wildcard matching when base is matched If base is already matched, skip that part when calling fnmatch(). This happens quite often if users start a command from worktree's subdirectory and prefix is usually prepended to all pathspecs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:47 +01:00
pathspec: save the non-wildcard length part We mark pathspec with wildcards with the field use_wildcard. We could do better by saving the length of the non-wildcard part, which can be used for optimizations such as f9f6e2c (exclude: do strcmp as much as possible before fnmatch - 2012-06-07). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-18 10:13:06 +01:00			`if (item->nowildcard_len < item->len) {`
pathspec: support :(glob) syntax :(glob)path differs from plain pathspec that it uses wildmatch with WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The difference lies in how '' (and '*') is processed. With the introduction of :(glob) and :(literal) and their global options --[no]glob-pathspecs, the user can: - make everything literal by default via --noglob-pathspecs --literal-pathspecs cannot be used for this purpose as it disables _all_ pathspec magic. - individually turn on globbing with :(glob) - make everything globbing by default via --glob-pathspecs - individually turn off globbing with :(literal) The implication behind this is, there is no way to gain the default matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get new globbing or literal. The old fnmatch behavior is considered deprecated and discouraged to use. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:08 +02:00			`if (!git_fnmatch(item, match + baselen, entry->path,`
pathspec: apply ".c" optimization from exclude When a pattern contains only a single asterisk as wildcard, e.g. "foobar", after literally comparing the leading part "foo" with the string, we can compare the tail of the string and make sure it matches "bar", instead of running fnmatch() on "bar" against the remainder of the string. -O2 build on linux-2.6, without the patch: $ time git rev-list --quiet HEAD -- '.c' real 0m40.770s user 0m40.290s sys 0m0.256s With the patch $ time ~/w/git/git rev-list --quiet HEAD -- '.c' real 0m34.288s user 0m33.997s sys 0m0.205s The above command is not supposed to be widely popular. It's chosen because it exercises pathspec matching a lot. The point is it cuts down matching time for popular patterns like .c, which could be used as pathspec in other places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:50 +01:00			`item->nowildcard_len - baselen))`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return entry_interesting;`
tree_entry_interesting(): optimize wildcard matching when base is matched If base is already matched, skip that part when calling fnmatch(). This happens quite often if users start a command from worktree's subdirectory and prefix is usually prepended to all pathspecs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:47 +01:00
			`/*`
			`* Match all directories. We'll try to`
			`* match files later on.`
			`*/`
			`if (ps->recursive && S_ISDIR(entry->mode))`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return entry_interesting;`
tree_entry_interesting(): optimize wildcard matching when base is matched If base is already matched, skip that part when calling fnmatch(). This happens quite often if users start a command from worktree's subdirectory and prefix is usually prepended to all pathspecs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:47 +01:00			`}`

			`continue;`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`}`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00
			`match_wildcards:`
pathspec: save the non-wildcard length part We mark pathspec with wildcards with the field use_wildcard. We could do better by saving the length of the non-wildcard part, which can be used for optimizations such as f9f6e2c (exclude: do strcmp as much as possible before fnmatch - 2012-06-07). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-18 10:13:06 +01:00			`if (item->nowildcard_len == item->len)`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`continue;`

tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`if (item->nowildcard_len &&`
			`!match_wildcard_base(item, base_str, baselen, &matched))`
tree_entry_interesting: match against all pathspecs The current basedir compare aborts early in order to avoid futile recursive searches. However, a match may still be found by another pathspec. This can cause an error while checking out files from a branch when using multiple pathspecs: $ git checkout master -- 'a/.txt' 'b/.txt' error: pathspec 'a/*.txt' did not match any file(s) known to git. Signed-off-by: Andy Spencer <andy753421@gmail.com> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-01-25 23:06:46 +01:00			`continue;`
tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`/*`
			`* Concatenate base and entry->path into one and do`
			`* fnmatch() on it.`
tree_entry_interesting: do basedir compare on wildcard patterns when possible Currently we treat ".c" and "path/to/.c" the same way. Which means we check all possible paths in repo against "path/to/.c". One could see that "path/elsewhere/foo.c" obviously cannot match "path/to/.c" and we only need to check all paths _inside_ "path/to/" against that pattern. This patch checks the leading fixed part of a pathspec against base directory and exit early if possible. We could even optimize further in "path/to/something.c" case (i.e. check the fixed part against name_entry as well) but that's more complicated and probably does not gain us much. -O2 build on linux-2.6, without and with this patch respectively: $ time git rev-list --quiet HEAD -- 'drivers/.c' real 1m9.484s user 1m9.128s sys 0m0.181s $ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c' real 0m15.710s user 0m15.564s sys 0m0.107s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:51 +01:00			`*`
			`* While we could avoid concatenation in certain cases`
			`* [1], which saves a memcpy and potentially a`
			`* realloc, it turns out not worth it. Measurement on`
			`* linux-2.6 does not show any clear improvements,`
			`* partly because of the nowildcard_len optimization`
			`* in git_fnmatch(). Avoid micro-optimizations here.`
			`*`
			`* [1] if match_wildcard_base() says the base`
			`* directory is already matched, we only need to match`
			`* the rest, which is shorter so _in theory_ faster.`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`*/`

			`strbuf_add(base, entry->path, pathlen);`

pathspec: support :(glob) syntax :(glob)path differs from plain pathspec that it uses wildmatch with WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The difference lies in how '' (and '*') is processed. With the introduction of :(glob) and :(literal) and their global options --[no]glob-pathspecs, the user can: - make everything literal by default via --noglob-pathspecs --literal-pathspecs cannot be used for this purpose as it disables _all_ pathspec magic. - individually turn on globbing with :(glob) - make everything globbing by default via --glob-pathspecs - individually turn off globbing with :(literal) The implication behind this is, there is no way to gain the default matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get new globbing or literal. The old fnmatch behavior is considered deprecated and discouraged to use. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:36:08 +02:00			`if (!git_fnmatch(item, match, base->buf + base_offset,`
pathspec: apply ".c" optimization from exclude When a pattern contains only a single asterisk as wildcard, e.g. "foobar", after literally comparing the leading part "foo" with the string, we can compare the tail of the string and make sure it matches "bar", instead of running fnmatch() on "bar" against the remainder of the string. -O2 build on linux-2.6, without the patch: $ time git rev-list --quiet HEAD -- '.c' real 0m40.770s user 0m40.290s sys 0m0.256s With the patch $ time ~/w/git/git rev-list --quiet HEAD -- '.c' real 0m34.288s user 0m33.997s sys 0m0.205s The above command is not supposed to be widely popular. It's chosen because it exercises pathspec matching a lot. The point is it cuts down matching time for popular patterns like .c, which could be used as pathspec in other places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-11-24 05:33:50 +01:00			`item->nowildcard_len)) {`
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`strbuf_setlen(base, base_offset + baselen);`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return entry_interesting;`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`}`
grep: drop pathspec_matches() in favor of tree_entry_interesting() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:45:33 +01:00			`strbuf_setlen(base, base_offset + baselen);`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00
			`/*`
			`* Match all directories. We'll try to match files`
			`* later on.`
Document limited recursion pathspec matching with wildcards It's actually unlimited recursion if wildcards are active regardless --max-depth Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-14 10:23:22 +01:00			`* max_depth is ignored but we may consider support it`
			`* in future, see`
			`* http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840`
tree_entry_interesting(): support wildcard matching never_interesting optimization is disabled if there is any wildcard pathspec, even if it only matches exactly on trees. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:46 +01:00			`*/`
			`if (ps->recursive && S_ISDIR(entry->mode))`
tree_entry_interesting(): give meaningful names to return values It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 08:36:10 +02:00			`return entry_interesting;`
Move tree_entry_interesting() to tree-walk.c and export it Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:40 +01:00			`}`
			`return never_interesting; /* No matches */`
			`}`
Support pathspec magic :(exclude) and its short form :! Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-12-06 08:30:48 +01:00
			`/*`
			`* Is a tree entry interesting given the pathspec we have?`
			`*`
			`* Pre-condition: either baselen == base_offset (i.e. empty path)`
			`* or base[baselen-1] == '/' (i.e. with trailing slash).`
			`*/`
			`enum interesting tree_entry_interesting(const struct name_entry *entry,`
			`struct strbuf *base, int base_offset,`
			`const struct pathspec *ps)`
			`{`
			`enum interesting positive, negative;`
			`positive = do_match(entry, base, base_offset, ps, 0);`

			`/*`
			`* case \| entry \| positive \| negative \| result`
			`* -----+-------+----------+----------+-------`
			`* 1 \| file \| -1 \| -1..2 \| -1`
			`* 2 \| file \| 0 \| -1..2 \| 0`
			`* 3 \| file \| 1 \| -1 \| 1`
			`* 4 \| file \| 1 \| 0 \| 1`
			`* 5 \| file \| 1 \| 1 \| 0`
			`* 6 \| file \| 1 \| 2 \| 0`
			`* 7 \| file \| 2 \| -1 \| 2`
			`* 8 \| file \| 2 \| 0 \| 2`
			`* 9 \| file \| 2 \| 1 \| 0`
			`* 10 \| file \| 2 \| 2 \| -1`
			`* -----+-------+----------+----------+-------`
			`* 11 \| dir \| -1 \| -1..2 \| -1`
			`* 12 \| dir \| 0 \| -1..2 \| 0`
			`* 13 \| dir \| 1 \| -1 \| 1`
			`* 14 \| dir \| 1 \| 0 \| 1`
			`* 15 \| dir \| 1 \| 1 \| 1 (*)`
			`* 16 \| dir \| 1 \| 2 \| 0`
			`* 17 \| dir \| 2 \| -1 \| 2`
			`* 18 \| dir \| 2 \| 0 \| 2`
			`* 19 \| dir \| 2 \| 1 \| 1 (*)`
			`* 20 \| dir \| 2 \| 2 \| -1`
			`*`
			`* (*) An exclude pattern interested in a directory does not`
			`* necessarily mean it will exclude all of the directory. In`
			`* wildcard case, it can't decide until looking at individual`
			`* files inside. So don't write such directories off yet.`
			`*/`

			`if (!(ps->magic & PATHSPEC_EXCLUDE) \|\|`
			`positive <= entry_not_interesting) /* #1, #2, #11, #12 */`
			`return positive;`

			`negative = do_match(entry, base, base_offset, ps, 1);`

			`/* #3, #4, #7, #8, #13, #14, #17, #18 */`
			`if (negative <= entry_not_interesting)`
			`return positive;`

			`/* #15, #19 */`
			`if (S_ISDIR(entry->mode) &&`
			`positive >= entry_interesting &&`
			`negative == entry_interesting)`
			`return entry_interesting;`

			`if ((positive == entry_interesting &&`
			`negative >= entry_interesting) \|\| /* #5, #6, #16 */`
			`(positive == all_entries_interesting &&`
			`negative == entry_interesting)) /* #9 */`
			`return entry_not_interesting;`

			`return all_entries_not_interesting; /* #10, #20 */`
			`}`