mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 14:04:52 +01:00

530 lines

14 KiB

C

Raw Normal View History

[PATCH] Diff-tree-helper take two. This reworks the diff-tree-helper and show-diff to further make external diff command interface simpler. These commands now honor GIT_EXTERNAL_DIFF environment variable which can point at an arbitrary program that takes 7 parameters: name file1 file1-sha1 file1-mode file2 file2-sha1 file2-mode The parameters for an external diff command are as follows: name this invocation of the command is to emit diff for the named cache/tree entry. file1 pathname that holds the contents of the first file. This can be a file inside the working tree, or a temporary file created from the blob object, or /dev/null. The command should not attempt to unlink it -- the temporary is unlinked by the caller. file1-sha1 sha1 hash if file1 is a blob object, or "." otherwise. file1-mode mode bits for file1, or "." for a deleted file. If GIT_EXTERNAL_DIFF environment variable is not set, the default is to invoke diff with the set of parameters old show-diff used to use. This built-in implementation honors the GIT_DIFF_CMD and GIT_DIFF_OPTS environment variables as before. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 18:25:05 +02:00			`/*`
			`* Copyright (C) 2005 Junio C Hamano`
			`*/`
[PATCH] Split external diff command interface to a separate file. With this patch, the non-core'ish part of show-diff command that invokes an external "diff" comand to obtain patches is split into a separate file. The next patch will introduce a new command, diff-tree-helper, which uses this common diff interface to format diff-tree and diff-cache output into a patch form. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 03:22:47 +02:00			`#include "cache.h"`
[PATCH] Make sq_expand() available as sq_quote(). A useful shell safety helper sq_expand() was hidden as a static function in diff.c. Extract it out and make it available as sq_quote(). Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-08 08:58:32 +02:00			`#include "quote.h"`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`#include "commit.h"`
[PATCH] Split external diff command interface to a separate file. With this patch, the non-core'ish part of show-diff command that invokes an external "diff" comand to obtain patches is split into a separate file. The next patch will introduce a new command, diff-tree-helper, which uses this common diff interface to format diff-tree and diff-cache output into a patch form. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 03:22:47 +02:00			`#include "diff.h"`
[PATCH] Diff overhaul, adding half of copy detection. This introduces the diff-core, the layer between the diff-tree family and the external diff interface engine. The calls to the interface diff-tree family uses (diff_change and diff_addremove) have not changed and will not change. The purpose of the diff-core layer is to provide an infrastructure to transform the set of differences sent from the applications, before sending them to the external diff interface. The recently introduced rename detection code has been rewritten to use the diff-core facility. When applications send in separate creates and deletes, matching ones are transformed into a single rename-and-edit diff, and sent out to the external diff interface as such. This patch also enhances the rename detection code further to be able to detect copies. Currently this happens only as long as copy sources appear as part of the modified files, but there already is enough provision for callers to report unmodified files to diff-core, so that they can be also used as copy source candidates. Extending the callers this way will be done in a separate patch. Please see and marvel at how well this works by trying out the newly added t/t4003-diff-rename-1.sh test script. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:39:09 +02:00			`#include "diffcore.h"`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`#include "revision.h"`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`#include "cache-tree.h"`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`#include "unpack-trees.h"`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`#include "refs.h"`
Show submodules as modified when they contain a dirty work tree Until now a submodule only then showed up as modified in the supermodule when the last commit in the submodule differed from the one in the index or the diffed against commit of the superproject. A dirty work tree containing new untracked or modified files in a submodule was undetectable when looking at it from the superproject. Now git status and git diff (against the work tree) in the superproject will also display submodules as modified when they contain untracked or modified files, even if the compared ref matches the HEAD of the submodule. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-16 18:42:24 +01:00			`#include "submodule.h"`
[PATCH] Diff overhaul, adding half of copy detection. This introduces the diff-core, the layer between the diff-tree family and the external diff interface engine. The calls to the interface diff-tree family uses (diff_change and diff_addremove) have not changed and will not change. The purpose of the diff-core layer is to provide an infrastructure to transform the set of differences sent from the applications, before sending them to the external diff interface. The recently introduced rename detection code has been rewritten to use the diff-core facility. When applications send in separate creates and deletes, matching ones are transformed into a single rename-and-edit diff, and sent out to the external diff interface as such. This patch also enhances the rename detection code further to be able to detect copies. Currently this happens only as long as copy sources appear as part of the modified files, but there already is enough provision for callers to report unmodified files to diff-core, so that they can be also used as copy source candidates. Extending the callers this way will be done in a separate patch. Please see and marvel at how well this works by trying out the newly added t/t4003-diff-rename-1.sh test script. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:39:09 +02:00
Optimize diff-cache -p --cached This patch optimizes "diff-cache -p --cached" by avoiding to inflate blobs into temporary files when the blob recorded in the cache matches the corresponding file in the work tree. The file in the work tree is passed as the comparison source in such a case instead. This optimization kicks in only when we have already read the cache this optimization and this is deliberate. Especially, diff-tree does not use this code, because changes are contained in small number of files relative to the project size most of the time, and reading cache is so expensive for a large project that the cost of reading it outweighs the savings by not inflating blobs. Also this patch cleans up the structure passed from diff clients by removing one unused structure member. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-04 10:45:24 +02:00			`/*`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`* diff-files`
Optimize diff-cache -p --cached This patch optimizes "diff-cache -p --cached" by avoiding to inflate blobs into temporary files when the blob recorded in the cache matches the corresponding file in the work tree. The file in the work tree is passed as the comparison source in such a case instead. This optimization kicks in only when we have already read the cache this optimization and this is deliberate. Especially, diff-tree does not use this code, because changes are contained in small number of files relative to the project size most of the time, and reading cache is so expensive for a large project that the cost of reading it outweighs the savings by not inflating blobs. Also this patch cleans up the structure passed from diff clients by removing one unused structure member. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-04 10:45:24 +02:00			`*/`
[PATCH] Optimize diff-tree -[CM] --stdin This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-28 00:56:38 +02:00
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`/*`
diff: a submodule not checked out is not modified 948dd34 (diff-index: careful when inspecting work tree items, 2008-03-30) made the work tree check careful not to be fooled by a new directory that exists at a place the index expects a blob. For such a change to be a typechange from blob to submodule, the new directory has to be a repository. However, if the index expects a submodule there, we should not insist the work tree entity to be a repository --- a simple directory that is not a full fledged repository (even an empty directory would do) should be considered an unmodified subproject, because that is how a superproject with a submodule is checked out sparsely by default. This makes the function check_work_tree_entity() even more careful not to report a submodule that is not checked out as removed. It fixes the recently added test in t4027. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-04 02:04:42 +02:00			`* Has the work tree entity been removed?`
			`*`
			`* Return 1 if it was removed from the work tree, 0 if an entity to be`
			`* compared with the cache entry ce still exists (the latter includes`
			`* the case where a directory that is not a submodule repository`
			`* exists for ce that is a submodule -- it is a submodule that is not`
			`* checked out). Return negative for an error.`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`*/`
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at all when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:21:07 +02:00			`static int check_removed(const struct cache_entry ce, struct stat st)`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`{`
			`if (lstat(ce->name, st) < 0) {`
			`if (errno != ENOENT && errno != ENOTDIR)`
			`return -1;`
			`return 1;`
			`}`
lstat_cache(): swap func(length, string) into func(string, length) Swap function argument pair (length, string) into (string, length) to conform with the commonly used order inside the GIT source code. Also, add a note about this fact into the coding guidelines. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-09 21:54:06 +01:00			`if (has_symlink_leading_path(ce->name, ce_namelen(ce)))`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`return 1;`
			`if (S_ISDIR(st->st_mode)) {`
			`unsigned char sub[20];`
diff: a submodule not checked out is not modified 948dd34 (diff-index: careful when inspecting work tree items, 2008-03-30) made the work tree check careful not to be fooled by a new directory that exists at a place the index expects a blob. For such a change to be a typechange from blob to submodule, the new directory has to be a repository. However, if the index expects a submodule there, we should not insist the work tree entity to be a repository --- a simple directory that is not a full fledged repository (even an empty directory would do) should be considered an unmodified subproject, because that is how a superproject with a submodule is checked out sparsely by default. This makes the function check_work_tree_entity() even more careful not to report a submodule that is not checked out as removed. It fixes the recently added test in t4027. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-04 02:04:42 +02:00
			`/*`
			`* If ce is already a gitlink, we can have a plain`
			`* directory (i.e. the submodule is not checked out),`
			`* or a checked out submodule. Either case this is not`
			`* a case where something was removed from the work tree,`
			`* so we will return 0.`
			`*`
			`* Otherwise, if the directory is not a submodule`
			`* repository, that means ce which was a blob turned into`
			`* a directory --- the blob was removed!`
			`*/`
			`if (!S_ISGITLINK(ce->ce_mode) &&`
			`resolve_gitlink_ref(ce->name, "HEAD", sub))`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`return 1;`
			`}`
			`return 0;`
			`}`
Teach git-diff-files the new option `--no-index` With this flag and given two paths, git-diff-files behaves as a GNU diff lookalike (plus the git goodies like --check, colour, etc.). This flag is also available in git-diff. It also works outside of a git repository. In addition, if git-diff{,-files} is called without revision or stage parameter, and with exactly two paths at least one of which is not tracked, the default is --no-index. So, you can now say git diff /etc/inittab /etc/fstab and it actually works! This also unifies the duplicated argument parsing between cmd_diff_files() and builtin_diff_files(). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-22 21:50:10 +01:00
Refactor dirty submodule detection in diff-lib.c Moving duplicated code into the new function match_stat_with_submodule(). Replacing the implicit activation of detailed checks for the dirtiness of submodules when DIFF_FORMAT_PATCH was selected with explicitly setting the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done(). Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-11 22:50:25 +01:00			`/*`
			`* Has a file changed or has a submodule new commits or a dirty work tree?`
			`*`
			`* Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES`
			`* option is set, the caller does not only want to know if a submodule is`
			`* modified at all but wants to know all the conditions that are met (new`
			`* commits, untracked content and/or modified content).`
			`*/`
			`static int match_stat_with_submodule(struct diff_options *diffopt,`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *ce,`
			`struct stat *st, unsigned ce_option,`
			`unsigned *dirty_submodule)`
Refactor dirty submodule detection in diff-lib.c Moving duplicated code into the new function match_stat_with_submodule(). Replacing the implicit activation of detailed checks for the dirtiness of submodules when DIFF_FORMAT_PATCH was selected with explicitly setting the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done(). Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-11 22:50:25 +01:00			`{`
			`int changed = ce_match_stat(ce, st, ce_option);`
Submodules: Add the new "ignore" config option for diff and status The new "ignore" config option controls the default behavior for "git status" and the diff family. It specifies under what circumstances they consider submodules as modified and can be set separately for each submodule. The command line option "--ignore-submodules=" has been extended to accept the new parameter "none" for both status and diff. Users that chose submodules to get rid of long work tree scanning times might want to set the "dirty" option for those submodules. This brings back the pre 1.7.0 behavior, where submodule work trees were never scanned for modifications. By using "--ignore-submodules=none" on the command line the status and diff commands can be told to do a full scan. This option can be set to the following values (which have the same name and meaning as for the "--ignore-submodules" option of status and diff): "all": All changes to the submodule will be ignored. "dirty": Only differences of the commit recorded in the superproject and the submodules HEAD will be considered modifications, all changes to the work tree of the submodule will be ignored. When using this value, the submodule will not be scanned for work tree changes at all, leading to a performance benefit on large submodules. "untracked": Only untracked files in the submodules work tree are ignored, a changed HEAD and/or modified files in the submodule will mark it as modified. "none" (which is the default): Either untracked or modified files in a submodules work tree or a difference between the subdmodules HEAD and the commit recorded in the superproject will make it show up as changed. This value is added as a new parameter for the "--ignore-submodules" option of the diff family and "git status" so the user can override the settings in the configuration. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-06 00:39:25 +02:00			`if (S_ISGITLINK(ce->ce_mode)) {`
			`unsigned orig_flags = diffopt->flags;`
			`if (!DIFF_OPT_TST(diffopt, OVERRIDE_SUBMODULE_CONFIG))`
			`set_diffopt_flags_from_submodule_config(diffopt, ce->name);`
			`if (DIFF_OPT_TST(diffopt, IGNORE_SUBMODULES))`
			`changed = 0;`
			`else if (!DIFF_OPT_TST(diffopt, IGNORE_DIRTY_SUBMODULES)`
			`&& (!changed \|\| DIFF_OPT_TST(diffopt, DIRTY_SUBMODULES)))`
			`*dirty_submodule = is_submodule_modified(ce->name, DIFF_OPT_TST(diffopt, IGNORE_UNTRACKED_IN_SUBMODULES));`
			`diffopt->flags = orig_flags;`
Refactor dirty submodule detection in diff-lib.c Moving duplicated code into the new function match_stat_with_submodule(). Replacing the implicit activation of detailed checks for the dirtiness of submodules when DIFF_FORMAT_PATCH was selected with explicitly setting the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done(). Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-11 22:50:25 +01:00			`}`
			`return changed;`
			`}`

ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`int run_diff_files(struct rev_info *revs, unsigned int option)`
[PATCH] Optimize diff-tree -[CM] --stdin This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-28 00:56:38 +02:00			`{`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`int entries, i;`
			`int diff_unmerged_stage = revs->max_count;`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;`
git-add: make the entry stat-clean after re-adding the same contents Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe (add_file_to_index: skip rehashing if the cached stat already matches), add_file_to_index() were taught not to re-add the path if it already matches the index. The change meant well, but was not executed quite right. It used ie_modified() to see if the file on the work tree is really different from the index, and skipped adding the contents if the function says "not modified". This was wrong. There are three possible comparison results between the index and the file in the work tree: - with lstat(2) we _know_ they are different. E.g. if the length or the owner in the cached stat information is different from the length we just obtained from lstat(2), we can tell the file is modified without looking at the actual contents. - with lstat(2) we _know_ they are the same. The same length, the same owner, the same everything (but this has a twist, as described below). - we cannot tell from lstat(2) information alone and need to go to the filesystem to actually compare. The last case arises from what we call 'racy git' situation, that can be caused with this sequence: $ echo hello >file $ git add file $ echo aeiou >file ;# the same length If the second "echo" is done within the same filesystem timestamp granularity as the first "echo", then the timestamp recorded by "git add" and the timestamp we get from lstat(2) will be the same, and we can mistakenly say the file is not modified. The path is called 'racily clean'. We need to reliably detect racily clean paths are in fact modified. To solve this problem, when we write out the index, we mark the index entry that has the same timestamp as the index file itself (that is the time from the point of view of the filesystem) to tell any later code that does the lstat(2) comparison not to trust the cached stat info, and ie_modified() then actually goes to the filesystem to compare the contents for such a path. That's all good, but it should not be used for this "git add" optimization, as the goal of "git add" is to actually update the path in the index and make it stat-clean. With the false optimization, we did _not_ cause any data loss (after all, what we failed to do was only to update the cached stat information), but it made the following sequence leave the file stat dirty: $ echo hello >file $ git add file $ echo hello >file ;# the same contents $ git add file The solution is not to use ie_modified() which goes to the filesystem to see if it is really clean, but instead use ie_match_stat() with "assume racily clean paths are dirty" option, to force re-adding of such a path. There was another problem with "git add -u". The codepath shares the same issue when adding the paths that are found to be modified, but in addition, it asked "git diff-files" machinery run_diff_files() function (which is "git diff-files") to list the paths that are modified. But "git diff-files" machinery uses the same ie_modified() call so that it does not report racily clean _and_ actually clean paths as modified, which is not what we want. The patch allows the callers of run_diff_files() to pass the same "assume racily clean paths are dirty" option, and makes "git-add -u" codepath to use that option, to discover and re-add racily clean _and_ actually clean paths. We could further optimize on top of this patch to differentiate the case where the path really needs re-adding (i.e. the content of the racily clean entry was indeed different) and the case where only the cached stat information needs to be refreshed (i.e. the racily clean entry was actually clean), but I do not think it is worth it. This patch applies to maint and all the way up. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 03:22:52 +01:00			`unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)`
			`? CE_MATCH_RACY_IS_DIRTY : 0);`
[PATCH] Optimize diff-tree -[CM] --stdin This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-28 00:56:38 +02:00
diff: vary default prefix depending on what are compared With a new configuration "diff.mnemonicprefix", "git diff" shows the differences between various combinations of preimage and postimage trees with prefixes different from the standard "a/" and "b/". Hopefully this will make the distinction stand out for some people. "git diff" compares the (i)ndex and the (w)ork tree; "git diff HEAD" compares a (c)ommit and the (w)ork tree; "git diff --cached" compares a (c)ommit and the (i)ndex; "git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity; "git diff --no-index a b" compares two non-git things (1) and (2). Because these mnemonics now have meanings, they are swapped when reverse diff is in effect and this feature is enabled. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-19 05:08:09 +02:00			`diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");`

Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`if (diff_unmerged_stage < 0)`
			`diff_unmerged_stage = 2;`
run_diff_{files,index}(): update calling convention. They used to open and read index themselves, but they now expect their callers to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-10 03:51:40 +01:00			`entries = active_nr;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`for (i = 0; i < entries; i++) {`
[PATCH] Diff-tree-helper take two. This reworks the diff-tree-helper and show-diff to further make external diff command interface simpler. These commands now honor GIT_EXTERNAL_DIFF environment variable which can point at an arbitrary program that takes 7 parameters: name file1 file1-sha1 file1-mode file2 file2-sha1 file2-mode The parameters for an external diff command are as follows: name this invocation of the command is to emit diff for the named cache/tree entry. file1 pathname that holds the contents of the first file. This can be a file inside the working tree, or a temporary file created from the blob object, or /dev/null. The command should not attempt to unlink it -- the temporary is unlinked by the caller. file1-sha1 sha1 hash if file1 is a blob object, or "." otherwise. file1-mode mode bits for file1, or "." for a deleted file. If GIT_EXTERNAL_DIFF environment variable is not set, the default is to invoke diff with the set of parameters old show-diff used to use. This built-in implementation honors the GIT_DIFF_CMD and GIT_DIFF_OPTS environment variables as before. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 18:25:05 +02:00			`struct stat st;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`unsigned int oldmode, newmode;`
			`struct cache_entry *ce = active_cache[i];`
			`int changed;`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`unsigned dirty_submodule = 0;`
Make git diff-generation use a simpler spawn-like interface Instead of depending of fork() and execve() and doing things in between the two, make the git diff functions do everything up front, and then do a single "spawn_prog()" invocation to run the actual external diff program (if any is even needed). This actually ends up simplifying the code, and should make it much easier to make it efficient under broken operating systems (read: Windows). Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-27 00:51:24 +01:00
diff: futureproof "stop feeding the backend early" logic Refactor the "do not stop feeding the backend early" logic into a small helper function and use it in both run_diff_files() and diff_tree() that has the stop-early optimization. We may later add other types of diffcore transformation that require to look at the whole result like diff-filter does, and having the logic in a single place is essential for longer term maintainability. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-31 18:14:17 +02:00			`if (diff_can_quit_early(&revs->diffopt))`
Teach --quiet to diff backends. This teaches git-diff-files, git-diff-index and git-diff-tree backends to exit early under --quiet option. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-14 19:12:51 +01:00			`break;`

Convert ce_path_match() to use struct pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:43:07 +01:00			`if (!ce_path_match(ce, &revs->prune_data))`
Make git diff-generation use a simpler spawn-like interface Instead of depending of fork() and execve() and doing things in between the two, make the git diff functions do everything up front, and then do a single "spawn_prog()" invocation to run the actual external diff program (if any is even needed). This actually ends up simplifying the code, and should make it much easier to make it efficient under broken operating systems (read: Windows). Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-27 00:51:24 +01:00			`continue;`
[PATCH] Diff-tree-helper take two. This reworks the diff-tree-helper and show-diff to further make external diff command interface simpler. These commands now honor GIT_EXTERNAL_DIFF environment variable which can point at an arbitrary program that takes 7 parameters: name file1 file1-sha1 file1-mode file2 file2-sha1 file2-mode The parameters for an external diff command are as follows: name this invocation of the command is to emit diff for the named cache/tree entry. file1 pathname that holds the contents of the first file. This can be a file inside the working tree, or a temporary file created from the blob object, or /dev/null. The command should not attempt to unlink it -- the temporary is unlinked by the caller. file1-sha1 sha1 hash if file1 is a blob object, or "." otherwise. file1-mode mode bits for file1, or "." for a deleted file. If GIT_EXTERNAL_DIFF environment variable is not set, the default is to invoke diff with the set of parameters old show-diff used to use. This built-in implementation honors the GIT_DIFF_CMD and GIT_DIFF_OPTS environment variables as before. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 18:25:05 +02:00
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`if (ce_stage(ce)) {`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`struct combine_diff_path *dpath;`
diff-files: show unmerged entries correctly Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) taught the command to show the object name and the mode from the entry coming from the tree side when comparing a tree with an unmerged index. This is a belated companion patch that teaches diff-files to show the mode from the entry coming from the working tree side, when comparing an unmerged index and the working tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:19:27 +02:00			`struct diff_filepair *pair;`
			`unsigned int wt_mode = 0;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`int num_compare_stages = 0;`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`size_t path_len;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`path_len = ce_namelen(ce);`

diff --cc: fix display of symlink conflicts during a merge. "git-diff-files --cc" to show conflicts during merge did not pass the correct mode information for the working tree down, and showed bogus combined diff. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 07:24:47 +01:00			`dpath = xmalloc(combine_diff_path_size(5, path_len));`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`dpath->path = (char *) &(dpath->parent[5]);`

			`dpath->next = NULL;`
			`dpath->len = path_len;`
			`memcpy(dpath->path, ce->name, path_len);`
			`dpath->path[path_len] = '\0';`
Convert memset(hash,0,20) to hashclr(hash). In the same spirit as hashcmp() and hashcpy(). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 22:57:23 +02:00			`hashclr(dpath->sha1);`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`memset(&(dpath->parent[0]), 0,`
diff --cc: fix display of symlink conflicts during a merge. "git-diff-files --cc" to show conflicts during merge did not pass the correct mode information for the working tree down, and showed bogus combined diff. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 07:24:47 +01:00			`sizeof(struct combine_diff_parent)*5);`

Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at all when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:21:07 +02:00			`changed = check_removed(ce, &st);`
diff-files: careful when inspecting work tree items This fixes the same breakage in diff-files. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:30:08 +02:00			`if (!changed)`
diff-files: show unmerged entries correctly Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) taught the command to show the object name and the mode from the entry coming from the tree side when comparing a tree with an unmerged index. This is a belated companion patch that teaches diff-files to show the mode from the entry coming from the working tree side, when comparing an unmerged index and the working tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:19:27 +02:00			`wt_mode = ce_mode_from_stat(ce, st.st_mode);`
diff-files: careful when inspecting work tree items This fixes the same breakage in diff-files. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:30:08 +02:00			`else {`
			`if (changed < 0) {`
diff --cc: fix display of symlink conflicts during a merge. "git-diff-files --cc" to show conflicts during merge did not pass the correct mode information for the working tree down, and showed bogus combined diff. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 07:24:47 +01:00			`perror(ce->name);`
			`continue;`
			`}`
			`if (silent_on_removed)`
			`continue;`
diff-files: show unmerged entries correctly Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) taught the command to show the object name and the mode from the entry coming from the tree side when comparing a tree with an unmerged index. This is a belated companion patch that teaches diff-files to show the mode from the entry coming from the working tree side, when comparing an unmerged index and the working tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:19:27 +02:00			`wt_mode = 0;`
diff --cc: fix display of symlink conflicts during a merge. "git-diff-files --cc" to show conflicts during merge did not pass the correct mode information for the working tree down, and showed bogus combined diff. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 07:24:47 +01:00			`}`
diff-files: show unmerged entries correctly Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) taught the command to show the object name and the mode from the entry coming from the tree side when comparing a tree with an unmerged index. This is a belated companion patch that teaches diff-files to show the mode from the entry coming from the working tree side, when comparing an unmerged index and the working tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:19:27 +02:00			`dpath->mode = wt_mode;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00
			`while (i < entries) {`
			`struct cache_entry *nce = active_cache[i];`
			`int stage;`

			`if (strcmp(ce->name, nce->name))`
			`break;`

			`/* Stage #2 (ours) is the first parent,`
			`* stage #3 (theirs) is the second.`
			`*/`
			`stage = ce_stage(nce);`
			`if (2 <= stage) {`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`int mode = nce->ce_mode;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`num_compare_stages++;`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(dpath->parent[stage-2].sha1, nce->sha1);`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`dpath->parent[stage-2].mode = ce_mode_from_stat(nce, mode);`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`dpath->parent[stage-2].status =`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`DIFF_STATUS_MODIFIED;`
			`}`

			`/* diff against the proper unmerged stage */`
			`if (stage == diff_unmerged_stage)`
			`ce = nce;`
			`i++;`
			`}`
			`/*`
			`* Compensate for loop update`
[PATCH] Optimize diff-tree -[CM] --stdin This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-28 00:56:38 +02:00			`*/`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`i--;`
Diff clean-up. This is a long overdue clean-up to the code for parsing and passing diff options. It also tightens some constness issues. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-21 09:00:47 +02:00
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`if (revs->combine_merges && num_compare_stages == 2) {`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`show_combined_diff(dpath, 2,`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`revs->dense_combined_merges,`
			`revs);`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`free(dpath);`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`continue;`
rename/copy score parsing updates. Better variant, which handles stuff like "4.5%" and rejects "192.168.0.1". Additionally, make sure numbers are unsigned (I'm making them unsigned long just for the hell of it), to make sure that artificial wraparound scenarios don't cause harm. -hpa [jc: with this, -M100 changes its meaning back to 10%. People wanting to say "pure renames only" should now say -M100% or -M1.0; sounds a bit like an earthquake, but arguably things are more consistent this way ;-)] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-11-21 23:17:12 +01:00			`}`
Don't instantiate structures with FAMs. Since structures with `flexible array members' are an incomplete datatype ANSI C99 forbids creating instances of them. This patch removes such an instance from `diff-lib.c' and replaces it with a pointer to a `struct combine_diff_path'. Since all neccessary memory is allocated at once the number of calls to `xmalloc' is not increased. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:05 +02:00			`free(dpath);`
			`dpath = NULL;`
[PATCH] diff: Clean up diff_scoreopt_parse(). This cleans up diff_scoreopt_parse() function that is used to parse the fractional notation -B, -C and -M option takes. The callers are modified to check for errors and complain. Earlier they silently ignored malformed input and falled back on the default. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-03 10:37:54 +02:00
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`/*`
			`* Show the diff for the 'ce' if we found the one`
			`* from the desired stage.`
			`*/`
diff-files: show unmerged entries correctly Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) taught the command to show the object name and the mode from the entry coming from the tree side when comparing a tree with an unmerged index. This is a belated companion patch that teaches diff-files to show the mode from the entry coming from the working tree side, when comparing an unmerged index and the working tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:19:27 +02:00			`pair = diff_unmerge(&revs->diffopt, ce->name);`
			`if (wt_mode)`
			`pair->two->mode = wt_mode;`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`if (ce_stage(ce) != diff_unmerged_stage)`
			`continue;`
[PATCH] diff: Update -B heuristics. As Linus pointed out on the mailing list discussion, -B should break a files that has many inserts even if it still keeps enough of the original contents, so that the broken pieces can later be matched with other files by -M or -C. However, if such a broken pair does not get picked up by -M or -C, we would want to apply different criteria; namely, regardless of the amount of new material in the result, the determination of "rewrite" should be done by looking at the amount of original material still left in the result. If you still have the original 97 lines from a 100-line document, it does not matter if you add your own 13 lines to make a 110-line document, or if you add 903 lines to make a 1000-line document. It is not a rewrite but an in-place edit. On the other hand, if you did lose 97 lines from the original, it does not matter if you added 27 lines to make a 30-line document or if you added 997 lines to make a 1000-line document. You did a complete rewrite in either case. This patch introduces a post-processing phase that runs after diffcore-rename matches up broken pairs diffcore-break creates. The purpose of this post-processing is to pick up these broken pieces and merge them back into in-place modifications. For this, the score parameter -B option takes is changed into a pair of numbers, and it takes "-B99/80" format when fully spelled out. The first number is the minimum amount of "edit" (same definition as what diffcore-rename uses, which is "sum of deletion and insertion") that a modification needs to have to be broken, and the second number is the minimum amount of "delete" a surviving broken pair must have to avoid being merged back together. It can be abbreviated to "-B" to use default for both, "-B9" or "-B9/" to use 90% for "edit" but default (80%) for merge avoidance, or "-B/75" to use default (99%) "edit" and 75% for merge avoidance. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-03 10:40:28 +02:00			`}`
[PATCH] Diffcore updates. This moves the path selection logic from individual programs to a new diffcore transformer (diff-tree still needs to have its own for performance reasons). Also the header printing code in diff-tree was tweaked not to produce anything when pickaxe is in effect and there is nothing interesting to report. An interesting example is the following in the GIT archive itself: $ git-whatchanged -p -C -S'or something in a real script' Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-22 19:04:37 +02:00
Make ce_uptodate() trustworthy again The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-24 09:10:20 +01:00			`if (ce_uptodate(ce) \|\| ce_skip_worktree(ce))`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`continue;`
diff-files: careful when inspecting work tree items This fixes the same breakage in diff-files. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:30:08 +02:00
Prevent diff machinery from examining assume-unchanged entries on worktree Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-08-11 17:43:59 +02:00			`/* If CE_VALID is set, don't look at workdir for file removal */`
			`changed = (ce->ce_flags & CE_VALID) ? 0 : check_removed(ce, &st);`
diff-files: careful when inspecting work tree items This fixes the same breakage in diff-files. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:30:08 +02:00			`if (changed) {`
			`if (changed < 0) {`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`perror(ce->name);`
[PATCH] Fix the way diffcore-rename records unremoved source. Earier version of diffcore-rename used to keep unmodified filepair in its output so that the last stage of the processing that tells renames from copies can make all of rename/copy to copies. However this had a bad interaction with other diffcore filters that wanted to run after diffcore-rename, in that such unmodified filepair must be retained for proper distinction between renames and copies to happen. This patch fixes the problem by changing the way diffcore-rename records the information needed to distinguish "all are copies" case and "the last one is a rename" case. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-28 00:55:55 +02:00			`continue;`
			`}`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`if (silent_on_removed)`
			`continue;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`diff_addremove(&revs->diffopt, '-', ce->ce_mode,`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`ce->sha1, !is_null_sha1(ce->sha1),`
			`ce->name, 0);`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`continue;`
[PATCH] Add --diff-filter= output restriction to diff-* family. This is a halfway between debugging aid and a helper to write an ultra-smart merge scripts. The new option takes a string that consists of a list of "status" letters, and limits the diff output to only those classes of changes, with two exceptions: - A broken pair (aka "complete rewrite"), does not match D (deleted) or N (created). Use B to look for them. - The letter "A" in the diff-filter string does not match anything itself, but causes the entire diff that contains selected patches to be output (this behaviour is similar to that of --pickaxe-all for the -S option). For example, $ git-rev-list HEAD \| git-diff-tree --stdin -s -v -B -C --diff-filter=BCR shows a list of commits that have complete rewrite, copy, or rename. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-12 05:57:13 +02:00			`}`
Refactor dirty submodule detection in diff-lib.c Moving duplicated code into the new function match_stat_with_submodule(). Replacing the implicit activation of detailed checks for the dirtiness of submodules when DIFF_FORMAT_PATCH was selected with explicitly setting the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done(). Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-11 22:50:25 +01:00			`changed = match_stat_with_submodule(&revs->diffopt, ce, &st,`
			`ce_option, &dirty_submodule);`
git status: Fix false positive "new commits" output for dirty submodules Testing if the output "new commits" should appear in the long format of "git status" is done by comparing the hashes of the diffpair. This always resulted in printing "new commits" for submodules that contained untracked or modified content, even if they did not contain new commits. The reason was that match_stat_with_submodule() did set the "changed" flag for dirty submodules, resulting in two->sha1 being set to the null_sha1 at the call sites, which indicates that new commits are present. This is changed so that when no new commits are present, the same object names are in the sha1 field for both sides of the filepair, and the working tree side will have the "dirty_submodule" flag set when appropriate. For a submodule to be seen as modified even when it just has a dirty work tree, some conditions had to be extended to also check for the "dirty_submodule" flag. Unfortunately the test case that should have found this bug had been changed incorrectly too. It is fixed and extended to test for other combinations too. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-12 22:23:52 +01:00			`if (!changed && !dirty_submodule) {`
diff-files: mark an index entry we know is up-to-date as such This does not make any difference when running diff-files alone, but if you internally run run_diff_files() and then run other operations further on the index, we do not have to run lstat(2) again on entries we already have checked. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-30 21:39:25 +02:00			`ce_mark_uptodate(ce);`
			`if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))`
			`continue;`
			`}`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`oldmode = ce->ce_mode;`
			`newmode = ce_mode_from_stat(ce, st.st_mode);`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`diff_change(&revs->diffopt, oldmode, newmode,`
			`ce->sha1, (changed ? null_sha1 : ce->sha1),`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`!is_null_sha1(ce->sha1), (changed ? 0 : !is_null_sha1(ce->sha1)),`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`ce->name, 0, dirty_submodule);`
[PATCH] Reworked external diff interface. This introduces three public functions for diff-cache and friends can use to call out to the GIT_EXTERNAL_DIFF program when they wish to. A normal "add/remove/change" entry is turned into 7-parameter process invocation of GIT_EXTERNAL_DIFF program as before. In addition, the program can now be called with a single parameter when diff-cache and friends want to report an unmerged path. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-27 18:21:00 +02:00
[PATCH] Simplify "reverse-diff" logic in the diff core. Instead of swapping the arguments just before output, this patch makes the swapping happen on the input side of the diff core, when "reverse-diff" is in effect. This greatly simplifies the logic, but more importantly it is necessary for upcoming "copy detection" work. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-20 18:54:07 +02:00			`}`
Libify diff-files. This is the first installment to libify diff brothers. The updated diff-files uses revision.c::setup_revisions() infrastructure to parse its command line arguments, which means the pathname arguments are checked more strictly than before. The tests are adjusted to separate possibly missing paths from the rest of arguments with double-dashes, to show the kosher way. As Linus pointed out, renaming diff.c to diff-lib.c was simply stupid, so I am renaming it back. The new diff-lib.c is to contain pieces extracted from diff brothers. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 08:57:45 +02:00			`diffcore_std(&revs->diffopt);`
			`diff_flush(&revs->diffopt);`
			`return 0;`
[PATCH] Reworked external diff interface. This introduces three public functions for diff-cache and friends can use to call out to the GIT_EXTERNAL_DIFF program when they wish to. A normal "add/remove/change" entry is turned into 7-parameter process invocation of GIT_EXTERNAL_DIFF program as before. In addition, the program can now be called with a single parameter when diff-cache and friends want to report an unmerged path. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-27 18:21:00 +02:00			`}`
[PATCH] Diff-tree-helper take two. This reworks the diff-tree-helper and show-diff to further make external diff command interface simpler. These commands now honor GIT_EXTERNAL_DIFF environment variable which can point at an arbitrary program that takes 7 parameters: name file1 file1-sha1 file1-mode file2 file2-sha1 file2-mode The parameters for an external diff command are as follows: name this invocation of the command is to emit diff for the named cache/tree entry. file1 pathname that holds the contents of the first file. This can be a file inside the working tree, or a temporary file created from the blob object, or /dev/null. The command should not attempt to unlink it -- the temporary is unlinked by the caller. file1-sha1 sha1 hash if file1 is a blob object, or "." otherwise. file1-mode mode bits for file1, or "." for a deleted file. If GIT_EXTERNAL_DIFF environment variable is not set, the default is to invoke diff with the set of parameters old show-diff used to use. This built-in implementation honors the GIT_DIFF_CMD and GIT_DIFF_OPTS environment variables as before. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-26 18:25:05 +02:00
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`/*`
			`* diff-index`
			`*/`

			`/* A file entry went away or appeared */`
			`static void diff_index_show_file(struct rev_info *revs,`
			`const char *prefix,`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *ce,`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`const unsigned char *sha1, int sha1_valid,`
			`unsigned int mode,`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`unsigned dirty_submodule)`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`{`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`diff_addremove(&revs->diffopt, prefix[0], mode,`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`sha1, sha1_valid, ce->name, dirty_submodule);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`}`

diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`static int get_stat_data(const struct cache_entry *ce,`
diff-lib.c: constness strengthening The internal implementation of diff-index codepath used to use non const pointer to pass sha1 around, but it did not have to. With this, we can also lose the private no_sha1[] array, as we can use the public null_sha1[] array that exists exactly for the same purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-02 09:57:26 +01:00			`const unsigned char **sha1p,`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`unsigned int *modep,`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`int cached, int match_missing,`
git diff: Don't test submodule dirtiness with --ignore-submodules The diff family suppresses the output of submodule changes when requested but checks them nonetheless. But since recently submodules get examined for their dirtiness, which is rather expensive. There is no need to do that when the --ignore-submodules option is used, as the gathered information is never used anyway. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-23 17:37:26 +01:00			`unsigned dirty_submodule, struct diff_options diffopt)`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`{`
diff-lib.c: constness strengthening The internal implementation of diff-index codepath used to use non const pointer to pass sha1 around, but it did not have to. With this, we can also lose the private no_sha1[] array, as we can use the public null_sha1[] array that exists exactly for the same purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-02 09:57:26 +01:00			`const unsigned char *sha1 = ce->sha1;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`unsigned int mode = ce->ce_mode;`

Avoid unnecessary 'lstat()' calls in 'get_stat_data()' When we ask get_stat_data() to get the mode and size of an index entry, we can avoid the lstat() call if we have marked the index entry as being uptodate due to earlier lstat() calls. This avoids a lot of unnecessary lstat() calls in eg 'git checkout', where the last phase shows the differences to the working tree (requiring a diff), but earlier phases have already verified the index. On the kernel repo (with a fast machine and everything cached), this changes timings of a nul 'git checkout' from - Before (best of ten): 0.14user 0.05system 0:00.19elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13237minor)pagefaults 0swaps - After 0.11user 0.03system 0:00.15elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+13235minor)pagefaults 0swaps so it can obviously be noticeable, although equally obviously it's not a show-stopper on this particular machine. The difference is likely larger on slower machines, or with operating systems that don't do as good a job of name caching. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-09 23:57:30 +02:00			`if (!cached && !ce_uptodate(ce)) {`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`int changed;`
			`struct stat st;`
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at all when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:21:07 +02:00			`changed = check_removed(ce, &st);`
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`if (changed < 0)`
			`return -1;`
			`else if (changed) {`
			`if (match_missing) {`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`*sha1p = sha1;`
			`*modep = mode;`
			`return 0;`
			`}`
			`return -1;`
			`}`
Refactor dirty submodule detection in diff-lib.c Moving duplicated code into the new function match_stat_with_submodule(). Replacing the implicit activation of detailed checks for the dirtiness of submodules when DIFF_FORMAT_PATCH was selected with explicitly setting the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done(). Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-11 22:50:25 +01:00			`changed = match_stat_with_submodule(diffopt, ce, &st,`
			`0, dirty_submodule);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`if (changed) {`
Do not take mode bits from index after type change. When we do not trust executable bit from lstat(2), we copied existing ce_mode bits without checking if the filesystem object is a regular file (which is the only thing we apply the "trust executable bit" business) nor if the blob in the index is a regular file (otherwise, we should do the same as registering a new regular file, which is to default non-executable). Noticed by Johannes Sixt. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-17 07:43:48 +01:00			`mode = ce_mode_from_stat(ce, st.st_mode);`
diff-lib.c: constness strengthening The internal implementation of diff-index codepath used to use non const pointer to pass sha1 around, but it did not have to. With this, we can also lose the private no_sha1[] array, as we can use the public null_sha1[] array that exists exactly for the same purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-02 09:57:26 +01:00			`sha1 = null_sha1;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`}`
			`}`

			`*sha1p = sha1;`
			`*modep = mode;`
			`return 0;`
			`}`

Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`static void show_new_file(struct rev_info *revs,`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *new,`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`int cached, int match_missing)`
			`{`
diff-lib.c: constness strengthening The internal implementation of diff-index codepath used to use non const pointer to pass sha1 around, but it did not have to. With this, we can also lose the private no_sha1[] array, as we can use the public null_sha1[] array that exists exactly for the same purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-02 09:57:26 +01:00			`const unsigned char *sha1;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`unsigned int mode;`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`unsigned dirty_submodule = 0;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00
diff-index: careful when inspecting work tree items Earlier, if you changed a staged path into a directory in the work tree, we happily ran lstat(2) on it and found that it exists, and declared that the user changed it to a gitlink. This is wrong for two reasons: (1) It may be a directory, but it may not be a submodule, and in the latter case, the change we need to report is "the blob at the path has disappeared". We need to check with resolve_gitlink_ref() to be consistent with what "git add" and "git update-index --add" does. (2) lstat(2) may have succeeded only because a leading component of the path was turned into a symbolic link that points at something that exists in the work tree. In such a case, the path itself does not exist anymore, as far as the index is concerned. This fixes these breakages in diff-index that the previous patch has exposed. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-31 02:29:48 +02:00			`/*`
			`* New file in the index: it might actually be different in`
Remove 'working copy' from the documentation and C code The git term is 'working tree', so replace the most public references to 'working copy'. Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-20 22:25:57 +02:00			`* the working tree.`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`*/`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`if (get_stat_data(new, &sha1, &mode, cached, match_missing,`
git diff: Don't test submodule dirtiness with --ignore-submodules The diff family suppresses the output of submodule changes when requested but checks them nonetheless. But since recently submodules get examined for their dirtiness, which is rather expensive. There is no need to do that when the --ignore-submodules option is used, as the gathered information is never used anyway. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-23 17:37:26 +01:00			`&dirty_submodule, &revs->diffopt) < 0)`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`return;`

diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`diff_index_show_file(revs, "+", new, sha1, !is_null_sha1(sha1), mode, dirty_submodule);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`}`

Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`static int show_modified(struct rev_info *revs,`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *old,`
			`const struct cache_entry *new,`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`int report_missing,`
			`int cached, int match_missing)`
			`{`
			`unsigned int mode, oldmode;`
diff-lib.c: constness strengthening The internal implementation of diff-index codepath used to use non const pointer to pass sha1 around, but it did not have to. With this, we can also lose the private no_sha1[] array, as we can use the public null_sha1[] array that exists exactly for the same purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-02 09:57:26 +01:00			`const unsigned char *sha1;`
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`unsigned dirty_submodule = 0;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00
Performance optimization for detection of modified submodules In the worst case is_submodule_modified() got called three times for each submodule. The information we got from scanning the whole submodule tree the first time can be reused instead. New parameters have been added to diff_change() and diff_addremove(), the information is stored in a new member of struct diff_filespec. Its value is then reused instead of calling is_submodule_modified() again. When no explicit "-dirty" is needed in the output the call to is_submodule_modified() is not necessary when the submodules HEAD already disagrees with the ref of the superproject, as this alone marks it as modified. To achieve that, get_stat_data() got an extra argument. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 21:26:18 +01:00			`if (get_stat_data(new, &sha1, &mode, cached, match_missing,`
git diff: Don't test submodule dirtiness with --ignore-submodules The diff family suppresses the output of submodule changes when requested but checks them nonetheless. But since recently submodules get examined for their dirtiness, which is rather expensive. There is no need to do that when the --ignore-submodules option is used, as the gathered information is never used anyway. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-23 17:37:26 +01:00			`&dirty_submodule, &revs->diffopt) < 0) {`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`if (report_missing)`
			`diff_index_show_file(revs, "-", old,`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`old->sha1, 1, old->ce_mode, 0);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`return -1;`
			`}`

diff-index --cc shows a 3-way diff between HEAD, index and working tree. This implements a 3-way diff between the HEAD commit, the state in the index, and the working directory. This is like the n-way diff for a merge, and uses much of the same code. It is invoked with the -c flag to git-diff-index, which it already accepted and did nothing with. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 13:38:40 +02:00			`if (revs->combine_merges && !cached &&`
			`(hashcmp(sha1, old->sha1) \|\| hashcmp(old->sha1, new->sha1))) {`
			`struct combine_diff_path *p;`
			`int pathlen = ce_namelen(new);`

			`p = xmalloc(combine_diff_path_size(2, pathlen));`
			`p->path = (char *) &p->parent[2];`
			`p->next = NULL;`
			`p->len = pathlen;`
			`memcpy(p->path, new->name, pathlen);`
			`p->path[pathlen] = 0;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`p->mode = mode;`
diff-index --cc shows a 3-way diff between HEAD, index and working tree. This implements a 3-way diff between the HEAD commit, the state in the index, and the working directory. This is like the n-way diff for a merge, and uses much of the same code. It is invoked with the -c flag to git-diff-index, which it already accepted and did nothing with. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 13:38:40 +02:00			`hashclr(p->sha1);`
			`memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));`
			`p->parent[0].status = DIFF_STATUS_MODIFIED;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`p->parent[0].mode = new->ce_mode;`
diff-index --cc shows a 3-way diff between HEAD, index and working tree. This implements a 3-way diff between the HEAD commit, the state in the index, and the working directory. This is like the n-way diff for a merge, and uses much of the same code. It is invoked with the -c flag to git-diff-index, which it already accepted and did nothing with. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 13:38:40 +02:00			`hashcpy(p->parent[0].sha1, new->sha1);`
			`p->parent[1].status = DIFF_STATUS_MODIFIED;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`p->parent[1].mode = old->ce_mode;`
diff-index --cc shows a 3-way diff between HEAD, index and working tree. This implements a 3-way diff between the HEAD commit, the state in the index, and the working directory. This is like the n-way diff for a merge, and uses much of the same code. It is invoked with the -c flag to git-diff-index, which it already accepted and did nothing with. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-04 13:38:40 +02:00			`hashcpy(p->parent[1].sha1, old->sha1);`
			`show_combined_diff(p, 2, revs->dense_combined_merges, revs);`
			`free(p);`
			`return 0;`
			`}`

Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`oldmode = old->ce_mode;`
git status: Fix false positive "new commits" output for dirty submodules Testing if the output "new commits" should appear in the long format of "git status" is done by comparing the hashes of the diffpair. This always resulted in printing "new commits" for submodules that contained untracked or modified content, even if they did not contain new commits. The reason was that match_stat_with_submodule() did set the "changed" flag for dirty submodules, resulting in two->sha1 being set to the null_sha1 at the call sites, which indicates that new commits are present. This is changed so that when no new commits are present, the same object names are in the sha1 field for both sides of the filepair, and the working tree side will have the "dirty_submodule" flag set when appropriate. For a submodule to be seen as modified even when it just has a dirty work tree, some conditions had to be extended to also check for the "dirty_submodule" flag. Unfortunately the test case that should have found this bug had been changed incorrectly too. It is fixed and extended to test for other combinations too. Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-12 22:23:52 +01:00			`if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`return 0;`

			`diff_change(&revs->diffopt, oldmode, mode,`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`old->sha1, sha1, 1, !is_null_sha1(sha1),`
			`old->name, 0, dirty_submodule);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`return 0;`
			`}`

Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`/*`
			`* This gets a mix of an existing index and a tree, one pathname entry`
			`* at a time. The index entry may be a single stage-0 one, but it could`
			`* also be multiple unmerged entries (in which case idx_pos/idx_nr will`
			`* give you the position and number of entries in the index).`
			`*/`
			`static void do_oneway_diff(struct unpack_trees_options *o,`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *idx,`
			`const struct cache_entry *tree)`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`{`
Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`struct rev_info *revs = o->unpack_data;`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`int match_missing, cached;`
Libified diff-index: backward compatibility fix. "diff-index -m" does not mean "do not ignore merges", but means "pretend missing files match the index". The previous round tried to address this, but failed because setup_revisions() ate "-m" flag before the caller had a chance to intervene. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 12:58:04 +02:00
Prevent diff machinery from examining assume-unchanged entries on worktree Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-08-11 17:43:59 +02:00			`/* if the entry is not checked out, don't examine work tree */`
Teach Git to respect skip-worktree bit (reading part) grep: turn on --cached for files that is marked skip-worktree ls-files: do not check for deleted file that is marked skip-worktree update-index: ignore update request if it's skip-worktree, while still allows removing diff*: skip worktree version Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-08-20 15:46:58 +02:00			`cached = o->index_only \|\|`
			`(idx && ((idx->ce_flags & CE_VALID) \|\| ce_skip_worktree(idx)));`
War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-07 09:04:01 +02:00			`/*`
Libified diff-index: backward compatibility fix. "diff-index -m" does not mean "do not ignore merges", but means "pretend missing files match the index". The previous round tried to address this, but failed because setup_revisions() ate "-m" flag before the caller had a chance to intervene. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 12:58:04 +02:00			`* Backward compatibility wart - "diff-index -m" does`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`* not mean "do not ignore merges", but "match_missing".`
			`*`
			`* But with the revision flag parsing, that's found in`
			`* "!revs->ignore_merges".`
			`*/`
			`match_missing = !revs->ignore_merges;`

			`if (cached && idx && ce_stage(idx)) {`
diff: remove often unused parameters from diff_unmerge() e9c8409 (diff-index --cached --raw: show tree entry on the LHS for unmerged entries., 2007-01-05) added a <mode, object name> pair as parameters to this function, to store them in the pre-image side of an unmerged file pair. Now the function is fixed to return the filepair it queued, we can make the caller on the special case codepath to do so. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-23 01:05:58 +02:00			`struct diff_filepair *pair;`
			`pair = diff_unmerge(&revs->diffopt, idx->name);`
reset [<commit>] paths...: do not mishandle unmerged paths Because "diff --cached HEAD" showed an incorrect blob object name on the LHS of the diff, we ended up updating the index entry with bogus value, not what we read from the tree. Noticed by John Nowak. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 06:36:29 +02:00			`if (tree)`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`fill_filespec(pair->one, tree->sha1, 1, tree->ce_mode);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`return;`
			`}`

			`/*`
			`* Something added to the tree?`
			`*/`
			`if (!tree) {`
Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`show_new_file(revs, idx, cached, match_missing);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`return;`
			`}`

			`/*`
			`* Something removed from the tree?`
Libified diff-index: backward compatibility fix. "diff-index -m" does not mean "do not ignore merges", but means "pretend missing files match the index". The previous round tried to address this, but failed because setup_revisions() ate "-m" flag before the caller had a chance to intervene. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 12:58:04 +02:00			`*/`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`if (!idx) {`
diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-28 17:03:01 +02:00			`diff_index_show_file(revs, "-", tree, tree->sha1, 1, tree->ce_mode, 0);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`return;`
			`}`

			`/* Show difference between old and new */`
Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`show_modified(revs, tree, idx, 1, cached, match_missing);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`}`

			`/*`
			`* The unpack_trees() interface is designed for merging, so`
			`* the different source entries are designed primarily for`
			`* the source trees, with the old index being really mainly`
			`* used for being replaced by the result.`
			`*`
			`* For diffing, the index is more important, and we only have a`
			`* single tree.`
			`*`
diff-lib.c: fix misleading comments on oneway_diff() 20a16eb (unpack_trees(): fix diff-index regression., 2008-03-10) adjusted diff-index to the new world order since 34110cd (Make 'unpack_trees()' have a separate source and destination index, 2008-03-06). Callbacks are expected to return anything non-negative as "success", and instead of reporting how many index entries they have processed, they are expected to advance o->pos themselves. The code did so, but a stale comment was left behind. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-18 07:12:17 +02:00			`* We're supposed to advance o->pos to skip what we have already processed.`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`*`
			`* This wrapper makes it all more readable, and takes care of all`
			`* the fairly complex unpack_trees() semantic requirements, including`
			`* the skipping, the path matching, the type conflict cases etc.`
			`*/`
diff-lib, read-tree, unpack-trees: mark cache_entry array paramters const Change the type merge_fn_t to accept the array of cache_entry pointers as const pointers to const pointers. This documents the fact that the merge functions don't modify the cache_entry contents or replace any of the pointers in the array. Only a single cast is necessary in unpack_nondirectories because adding two const modifiers at once is not allowed in C. The cast is safe in that it doesn't mask any modfication; call_unpack_fn only needs the array for reading. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:56 +02:00			`static int oneway_diff(const struct cache_entry * const *src,`
			`struct unpack_trees_options *o)`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`{`
diff-lib, read-tree, unpack-trees: mark cache_entry pointers const Add const to struct cache_entry pointers throughout the tree which are only used for reading. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:55 +02:00			`const struct cache_entry *idx = src[0];`
			`const struct cache_entry *tree = src[1];`
Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`struct rev_info *revs = o->unpack_data;`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00
			`/*`
			`* Unpack-trees generates a DF/conflict entry if`
			`* there was a directory in the index and a tree`
			`* in the tree. From a diff standpoint, that's a`
			`* delete of the tree and a create of the file.`
			`*/`
			`if (tree == o->df_conflict_entry)`
			`tree = NULL;`

diff-index --quiet: learn the "stop feeding the backend early" logic A negative return from the unpack callback function usually means unpack failed for the entry and signals the unpack_trees() machinery to fail the entire merge operation, immediately and there is no other way for the callback to tell the machinery to exit early without reporting an error. This is what we usually want to make a merge all-or-nothing operation, but the machinery is also used for diff-index codepath by using a custom unpack callback function. And we do sometimes want to exit early without failing, namely when we are under --quiet and can short-cut the diff upon finding the first difference. Add "exiting_early" field to unpack_trees_options structure, to signal the unpack_trees() machinery that the negative return value is not signaling an error but an early return from the unpack_trees() machinery. As this by definition hasn't unpacked everything, discard the resulting index just like the failure codepath. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-31 19:06:44 +02:00			`if (ce_path_match(idx ? idx : tree, &revs->prune_data)) {`
Make 'unpack_trees()' have a separate source and destination index We will always unpack into our own internal index, but we will take the source from wherever specified, and we will optionally write the result to a specified index (optionally, because not everybody even _wants_ any result: the index diffing really wants to just walk the tree and index in parallel). This ends up removing a fair number more lines than it adds, for the simple reason that we can now skip all the crud that tried to be oh-so-careful about maintaining our position in the index as we were traversing and modifying it. Since we don't actually modify the source index any more, we can just update the 'o->pos' pointer without worrying about whether an index entry got removed or replaced or added to. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-07 03:12:28 +01:00			`do_oneway_diff(o, idx, tree);`
diff-index --quiet: learn the "stop feeding the backend early" logic A negative return from the unpack callback function usually means unpack failed for the entry and signals the unpack_trees() machinery to fail the entire merge operation, immediately and there is no other way for the callback to tell the machinery to exit early without reporting an error. This is what we usually want to make a merge all-or-nothing operation, but the machinery is also used for diff-index codepath by using a custom unpack callback function. And we do sometimes want to exit early without failing, namely when we are under --quiet and can short-cut the diff upon finding the first difference. Add "exiting_early" field to unpack_trees_options structure, to signal the unpack_trees() machinery that the negative return value is not signaling an error but an early return from the unpack_trees() machinery. As this by definition hasn't unpacked everything, discard the resulting index just like the failure codepath. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-31 19:06:44 +02:00			`if (diff_can_quit_early(&revs->diffopt)) {`
			`o->exiting_early = 1;`
			`return -1;`
			`}`
			`}`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00
Make 'unpack_trees()' have a separate source and destination index We will always unpack into our own internal index, but we will take the source from wherever specified, and we will optionally write the result to a specified index (optionally, because not everybody even _wants_ any result: the index diffing really wants to just walk the tree and index in parallel). This ends up removing a fair number more lines than it adds, for the simple reason that we can now skip all the crud that tried to be oh-so-careful about maintaining our position in the index as we were traversing and modifying it. Since we don't actually modify the source index any more, we can just update the 'o->pos' pointer without worrying about whether an index entry got removed or replaced or added to. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-07 03:12:28 +01:00			`return 0;`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`}`

diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`static int diff_cache(struct rev_info *revs,`
			`const unsigned char *tree_sha1,`
			`const char *tree_name,`
			`int cached)`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`{`
			`struct tree *tree;`
			`struct tree_desc t;`
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`struct unpack_trees_options opts;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`tree = parse_tree_indirect(tree_sha1);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`if (!tree)`
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`return error("bad tree object %s",`
			`tree_name ? tree_name : sha1_to_hex(tree_sha1));`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`memset(&opts, 0, sizeof(opts));`
			`opts.head_idx = 1;`
			`opts.index_only = cached;`
Avoid "diff-index --cached" optimization under --find-copies-harder When find-copies-harder is in effect, the diff frontends are expected to feed all paths, not just changed paths, to the diffcore, so that copy sources can be picked up. In such a case, not descending into subtrees using the cache-tree information is simply wrong. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-23 08:14:25 +02:00			`opts.diff_index_cached = (cached &&`
			`!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER));`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`opts.merge = 1;`
			`opts.fn = oneway_diff;`
Cleanup of unused symcache variable inside diff-lib.c Commit c40641b77b0274186fd1b327d5dc3246f814aaaf, 'Optimize symlink/directory detection' by Linus Torvalds, removed the 'char symcache' parameter to the has_symlink_leading_path() function. This made all variables currently named 'symcache' inside diff-lib.c unnecessary. This also let us throw away the 'struct oneway_unpack_data', and instead directly use the 'struct rev_info revs' member, which was the only member left after removal of the 'symcache[] array' member. The 'struct oneway_unpack_data' was introduced by the following commit: 948dd346 "diff-files: careful when inspecting work tree items" Impact: cleanup PATH_MAX bytes less memory stack usage in some cases Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-11 19:36:42 +01:00			`opts.unpack_data = revs;`
Make 'unpack_trees()' have a separate source and destination index We will always unpack into our own internal index, but we will take the source from wherever specified, and we will optionally write the result to a specified index (optionally, because not everybody even _wants_ any result: the index diffing really wants to just walk the tree and index in parallel). This ends up removing a fair number more lines than it adds, for the simple reason that we can now skip all the crud that tried to be oh-so-careful about maintaining our position in the index as we were traversing and modifying it. Since we don't actually modify the source index any more, we can just update the 'o->pos' pointer without worrying about whether an index entry got removed or replaced or added to. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-07 03:12:28 +01:00			`opts.src_index = &the_index;`
			`opts.dst_index = NULL;`
diff-index: pass pathspec down to unpack-trees machinery And finally, pass the pathspec down through unpack_trees() to traverse_trees() callchain. Before and after applying this series, looking for changes in the kernel repository with a fairly narrow pathspec becomes somewhat faster. (without patch) $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null 0.48user 0.05system 0:00.53elapsed 100%CPU (0avgtext+0avgdata 163296maxresident)k 0inputs+952outputs (0major+11163minor)pagefaults 0swaps (with patch) $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null 0.01user 0.00system 0:00.02elapsed 104%CPU (0avgtext+0avgdata 43856maxresident)k 0inputs+24outputs (0major+3688minor)pagefaults 0swaps Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 22:34:08 +02:00			`opts.pathspec = &revs->diffopt.pathspec;`
diff-index: enable recursive pathspec matching in unpack_trees The pathspec structure has a few bits of data to drive various operation modes after we unified the pathspec matching logic in various codepaths. For example, max_depth field is there so that "git grep" can limit the output for files found in limited depth of tree traversal. Also in order to show just the surface level differences in "git diff-tree", recursive field stops us from descending into deeper level of the tree structure when it is set to false, and this also affects pathspec matching when we have wildcards in the pathspec. The diff-index has always wanted the recursive behaviour, and wanted to match pathspecs without any depth limit. But we forgot to do so when we updated tree_entry_interesting() logic to unify the pathspec matching logic. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-15 11:03:27 +01:00			`opts.pathspec->recursive = 1;`
			`opts.pathspec->max_depth = -1;`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00
			`init_tree_desc(&t, tree->buffer, tree->size);`
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`return unpack_trees(1, &t, &opts);`
			`}`

			`int run_diff_index(struct rev_info *revs, int cached)`
			`{`
			`struct object_array_entry *ent;`

			`ent = revs->pending.objects;`
			`if (diff_cache(revs, ent->item->sha1, ent->name, cached))`
Allow callers of unpack_trees() to handle failure Return an error from unpack_trees() instead of calling die(), and exit with an error in read-tree, builtin-commit, and diff-lib. merge-recursive already expected an error return from unpack_trees, so it doesn't need to be changed. The merge function can return negative to abort. This will be used in builtin-checkout -m. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> 2008-02-07 17:39:48 +01:00			`exit(128);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00
diff: vary default prefix depending on what are compared With a new configuration "diff.mnemonicprefix", "git diff" shows the differences between various combinations of preimage and postimage trees with prefixes different from the standard "a/" and "b/". Hopefully this will make the distinction stand out for some people. "git diff" compares the (i)ndex and the (w)ork tree; "git diff HEAD" compares a (c)ommit and the (w)ork tree; "git diff --cached" compares a (c)ommit and the (i)ndex; "git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity; "git diff --no-index a b" compares two non-git things (1) and (2). Because these mnemonics now have meanings, they are swapped when reverse diff is in effect and this feature is enabled. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-19 05:08:09 +02:00			`diff_set_mnemonic_prefix(&revs->diffopt, "c/", cached ? "i/" : "w/");`
unpack-trees.c: look ahead in the index This makes the traversal of index be in sync with the tree traversal. When unpack_callback() is fed a set of tree entries from trees, it inspects the name of the entry and checks if the an index entry with the same name could be hiding behind the current index entry, and (1) if the name appears in the index as a leaf node, it is also fed to the n_way_merge() callback function; (2) if the name is a directory in the index, i.e. there are entries in that are underneath it, then nothing is fed to the n_way_merge() callback function; (3) otherwise, if the name comes before the first eligible entry in the index, the index entry is first unpacked alone. When traverse_trees_recursive() descends into a subdirectory, the cache_bottom pointer is moved to walk index entries within that directory. All of these are omitted for diff-index, which does not even want to be fed an index entry and a tree entry with D/F conflicts. This fixes 3-way read-tree and exposes a bug in other parts of the system in t6035, test #5. The test prepares these three trees: O = HEAD^ 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/x A = HEAD 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b/c/d 100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb a/x B = master 120000 blob a36b77384451ea1de7bd340ffca868249626bc52 a/b 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/b-2/c/d 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 a/x With a clean index that matches HEAD, running git read-tree -m -u --aggressive $O $A $B now yields 120000 a36b77384451ea1de7bd340ffca868249626bc52 3 a/b 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 a/b-2/c/d 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 1 a/b/c/d 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 2 a/b/c/d 100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0 a/x which is correct. "master" created "a/b" symlink that did not exist, and removed "a/b/c/d" while HEAD did not do touch either path. Before this series, read-tree did not notice the situation and resolved addition of "a/b" and removal of "a/b/c/d" independently. If A = HEAD had another path "a/b/c/e" added, this merge should conflict but instead it silently resolved "a/b" and then immediately overwrote it to add "a/b/c/e", which was quite bogus. Tests in t1012 start to work with this. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-09-20 09:03:39 +02:00			`diffcore_fix_diff_index(&revs->diffopt);`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`diffcore_std(&revs->diffopt);`
			`diff_flush(&revs->diffopt);`
Make run_diff_index() use unpack_trees(), not read_tree() A plain "git commit" would still run lstat() a lot more than necessary, because wt_status_print() would cause the index to be repeatedly flushed and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit to be lost, resulting in the files in the index being lstat'ed three times each. The reason why wt-status.c ended up invalidating and re-reading the cache multiple times was that it uses "run_diff_index()", which in turn uses "read_tree()" to populate the index with both the old index and the tree we want to compare against. So this patch re-writes run_diff_index() to not use read_tree(), but instead use "unpack_trees()" to diff the index to a tree. That, in turn, means that we don't need to modify the index itself, which then means that we don't need to invalidate it and re-read it! This, together with the lstat() optimizations, means that "git commit" on the kernel tree really only needs to lstat() the index entries once. That noticeably cuts down on the cached timings. Best time before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.399s user 0m0.232s sys 0m0.164s Best time after: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.254s user 0m0.140s sys 0m0.112s so it's a noticeable improvement in addition to being a nice conceptual cleanup (it's really not that pretty that "run_diff_index()" dirties the index!) Doing an "strace -c" on it also shows that as it cuts the number of lstat() calls by two thirds, it goes from being lstat()-limited to being limited by getdents() (which is the readdir system call): Before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 60.69 0.000704 0 69230 31 lstat 23.62 0.000274 0 5522 getdents 8.36 0.000097 0 5508 2638 open 2.59 0.000030 0 2869 close 2.50 0.000029 0 274 write 1.47 0.000017 0 2844 fstat After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.17 0.000276 0 5522 getdents 26.51 0.000162 0 23112 31 lstat 19.80 0.000121 0 5503 2638 open 4.91 0.000030 0 2864 close 1.48 0.000020 0 274 write 1.34 0.000018 0 2844 fstat ... It passes the test-suite for me, but this is another of one of those really core functions, and certainly pretty subtle, so.. NOTE! The Linux lstat() system call is really quite cheap when everything is cached, so the fact that this is quite noticeable on Linux is likely to mean that it is much more noticeable on other operating systems. I bet you'll see a much bigger performance improvement from this on Windows in particular. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 02:27:12 +01:00			`return 0;`
Libify diff-index. The second installment to libify diff brothers. The pathname arguments are checked more strictly than before because we now use the revision.c::setup_revisions() infrastructure. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-22 11:43:00 +02:00			`}`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00
			`int do_diff_cache(const unsigned char tree_sha1, struct diff_options opt)`
			`{`
			`struct rev_info revs;`

			`init_revisions(&revs, NULL);`
struct rev_info: convert prune_data to struct pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:43:06 +01:00			`init_pathspec(&revs.prune_data, opt->pathspec.raw);`
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`revs.diffopt = *opt;`
Also use unpack_trees() in do_diff_cache() As in run_diff_index(), we call unpack_trees() with the oneway_diff() function in do_diff_cache() now. This makes the function diff_cache() obsolete. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 16:19:56 +01:00
diff-lib: refactor run_diff_index() and do_diff_cache() The latter is meant to be an API for internal callers that want to inspect the resulting diff-queue, while the former is an implementation of "git diff-index" command. Extract the common logic into a single helper function and make them thin wrappers around it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-14 03:57:42 +02:00			`if (diff_cache(&revs, tree_sha1, NULL, 1))`
Allow callers of unpack_trees() to handle failure Return an error from unpack_trees() instead of calling die(), and exit with an error in read-tree, builtin-commit, and diff-lib. merge-recursive already expected an error return from unpack_trees, so it doesn't need to be changed. The merge function can return negative to abort. This will be used in builtin-checkout -m. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> 2008-02-07 17:39:48 +01:00			`exit(128);`
Also use unpack_trees() in do_diff_cache() As in run_diff_index(), we call unpack_trees() with the oneway_diff() function in do_diff_cache() now. This makes the function diff_cache() obsolete. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-20 16:19:56 +01:00			`return 0;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`}`
Generalize and libify index_is_dirty() to index_differs_from(...) index_is_dirty() in builtin-revert.c checks if the index is dirty. This patch generalizes this function to check if the index differs from a revision, i.e. the former index_is_dirty() behavior can now be achieved by index_differs_from("HEAD", 0). The second argument "diff_flags" allows to set further diff option flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h for a list. index_differs_from() seems to be useful for more than builtin-revert.c, so it is moved into diff-lib.c and also used in builtin-commit.c. Yet to mention: - "rev.abbrev = 0;" can be safely removed. This has no impact on performance or functioning of neither setup_revisions() nor run_diff_index(). - rev.pending.objects is free()d because this fixes a leak. (Also see 295dd2ad "Fix memory leak in traverse_commit_list") Mentored-by: Daniel Barkalow <barkalow@iabervon.org> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-10 15:30:35 +01:00
			`int index_differs_from(const char *def, int diff_flags)`
			`{`
			`struct rev_info rev;`
revision: introduce setup_revision_opt So far the last parameter to setup_revisions() was to specify the default ref when the command line did not give any (typically "HEAD"). This changes it to take a pointer to a structure so that we can add other information without touching too many codepaths in later patches. There is no functionality change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-09 07:58:09 +01:00			`struct setup_revision_opt opt;`
Generalize and libify index_is_dirty() to index_differs_from(...) index_is_dirty() in builtin-revert.c checks if the index is dirty. This patch generalizes this function to check if the index differs from a revision, i.e. the former index_is_dirty() behavior can now be achieved by index_differs_from("HEAD", 0). The second argument "diff_flags" allows to set further diff option flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h for a list. index_differs_from() seems to be useful for more than builtin-revert.c, so it is moved into diff-lib.c and also used in builtin-commit.c. Yet to mention: - "rev.abbrev = 0;" can be safely removed. This has no impact on performance or functioning of neither setup_revisions() nor run_diff_index(). - rev.pending.objects is free()d because this fixes a leak. (Also see 295dd2ad "Fix memory leak in traverse_commit_list") Mentored-by: Daniel Barkalow <barkalow@iabervon.org> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-10 15:30:35 +01:00
			`init_revisions(&rev, NULL);`
revision: introduce setup_revision_opt So far the last parameter to setup_revisions() was to specify the default ref when the command line did not give any (typically "HEAD"). This changes it to take a pointer to a structure so that we can add other information without touching too many codepaths in later patches. There is no functionality change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-09 07:58:09 +01:00			`memset(&opt, 0, sizeof(opt));`
			`opt.def = def;`
			`setup_revisions(0, NULL, &rev, &opt);`
diff: Rename QUIET internal option to QUICK The option "QUIET" primarily meant "find if we have _any_ difference as quick as possible and report", which means we often do not even have to look at blobs if we know the trees are different by looking at the higher level (e.g. "diff-tree A B"). As a side effect, because there is no point showing one change that we happened to have found first, it also enables NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet. Rename the internal option to QUICK to reflect this better; it also makes grepping the source tree much easier, as there are other kinds of QUIET option everywhere. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-23 10:15:35 +02:00			`DIFF_OPT_SET(&rev.diffopt, QUICK);`
Generalize and libify index_is_dirty() to index_differs_from(...) index_is_dirty() in builtin-revert.c checks if the index is dirty. This patch generalizes this function to check if the index differs from a revision, i.e. the former index_is_dirty() behavior can now be achieved by index_differs_from("HEAD", 0). The second argument "diff_flags" allows to set further diff option flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h for a list. index_differs_from() seems to be useful for more than builtin-revert.c, so it is moved into diff-lib.c and also used in builtin-commit.c. Yet to mention: - "rev.abbrev = 0;" can be safely removed. This has no impact on performance or functioning of neither setup_revisions() nor run_diff_index(). - rev.pending.objects is free()d because this fixes a leak. (Also see 295dd2ad "Fix memory leak in traverse_commit_list") Mentored-by: Daniel Barkalow <barkalow@iabervon.org> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-10 15:30:35 +01:00			`DIFF_OPT_SET(&rev.diffopt, EXIT_WITH_STATUS);`
			`rev.diffopt.flags \|= diff_flags;`
			`run_diff_index(&rev, 1);`
			`if (rev.pending.alloc)`
			`free(rev.pending.objects);`
			`return (DIFF_OPT_TST(&rev.diffopt, HAS_CHANGES) != 0);`
			`}`