mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 06:03:44 +01:00

1994 lines

53 KiB

C

Raw Normal View History

Add copyright notices. The tool interface sucks (especially "committing" information, which is just me doing everything by hand from the command line), but I think this is in theory actually a viable way of describing the world. So copyright it. 2005-04-08 00:16:10 +02:00			`/*`
			`* GIT - The information manager from hell`
			`*`
			`* Copyright (C) Linus Torvalds, 2005`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`#define NO_THE_INDEX_COMPATIBILITY_MACROS`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`#include "cache.h"`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`#include "cache-tree.h"`
Teach core object handling functions about gitlinks This teaches the really fundamental core SHA1 object handling routines about gitlinks. We can compare trees with gitlinks in them (although we can not actually generate patches for them yet - just raw git diffs), and they show up as commits in "git ls-tree". We also know to compare gitlinks as if they were directories (ie the normal "sort as trees" rules apply). [jc: amended a cut&paste error] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 06:20:29 +02:00			`#include "refs.h"`
git-add: Add support for --refresh option. This allows to refresh only a subset of the project files, based on the specified pathspecs. Signed-off-by: Alexandre Julliard <julliard@winehq.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-11 23:59:01 +02:00			`#include "dir.h"`
builtin-add.c: restructure the code for maintainability A private function add_files_to_cache() in builtin-add.c was borrowed by checkout and commit re-implementors without getting properly refactored to more library-ish place. This does the refactoring. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-21 10:24:17 +02:00			`#include "tree.h"`
			`#include "commit.h"`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`#include "blob.h"`
resolve-undo: record resolved conflicts in a new index extension section When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-25 09:30:51 +01:00			`#include "resolve-undo.h"`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`#include "strbuf.h"`
			`#include "varint.h"`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00
read-cache.c: mark file-local functions static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-12 07:29:35 +01:00			`static struct cache_entry refresh_cache_entry(struct cache_entry ce, int really);`

Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`/* Mask for the name length in ce_flags in the on-disk index */`

			`#define CE_NAMEMASK (0x0fff)`

index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`/* Index extensions.`
			`*`
			`* The first letter should be 'A'..'Z' for extensions that are not`
			`* necessary for a correct operation (i.e. optimization data).`
			`* When new extensions are added that _needs_ to be understood in`
			`* order to correctly interpret the index file, pick character that`
			`* is outside the range, to cause the reader to abort.`
			`*/`

			`#define CACHE_EXT(s) ( (s[0]<<24)\|(s[1]<<16)\|(s[2]<<8)\|(s[3]) )`
			`#define CACHE_EXT_TREE 0x54524545 /* "TREE" */`
Correct spelling of 'REUC' extension The new dircache extension CACHE_EXT_RESOLVE_UNDO, whose value is 0x52455543, is actually the ASCII sequence 'REUC', not the ASCII sequence 'REUN'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-02-02 16:33:28 +01:00			`#define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
Move index-related variables into a structure. This defines a index_state structure and moves index-related global variables into it. Currently there is one instance of it, the_index, and everybody accesses it, so there is no code change. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 03:14:06 +02:00			`struct index_state the_index;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00
lazy index hashing This delays the hashing of index names until it becomes necessary for the first time. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 08:01:13 +01:00			`static void set_index_entry(struct index_state istate, int nr, struct cache_entry ce)`
			`{`
			`istate->cache[nr] = ce;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`add_name_hash(istate, ce);`
lazy index hashing This delays the hashing of index names until it becomes necessary for the first time. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 08:01:13 +01:00			`}`

Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`static void replace_index_entry(struct index_state istate, int nr, struct cache_entry ce)`
			`{`
			`struct cache_entry *old = istate->cache[nr];`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`remove_name_hash(istate, old);`
Fix name re-hashing semantics We handled the case of removing and re-inserting cache entries badly, which is something that merging commonly needs to do (removing the different stages, and then re-inserting one of them as the merged state). We even had a rather ugly special case for this failure case, where replace_index_entry() basically turned itself into a no-op if the new and the old entries were the same, exactly because the hash routines didn't handle it on their own. So what this patch does is to not just have the UNHASHED bit, but a HASHED bit too, and when you insert an entry into the name hash, that involves: - clear the UNHASHED bit, because now it's valid again for lookup (which is really all that UNHASHED meant) - if we're being lazy, we're done here (but we still want to clear the UNHASHED bit regardless of lazy mode, since we can become unlazy later, and so we need the UNHASHED bit to always be set correctly, even if we never actually insert the entry into the hash list) - if it was already hashed, we just leave it on the list - otherwise mark it HASHED and insert it into the list this all means that unhashing and rehashing a name all just works automatically. Obviously, you cannot change the name of an entry (that would be a serious bug), but nothing can validly do that anyway (you'd have to allocate a new struct cache_entry anyway since the name length could change), so that's not a new limitation. The code actually gets simpler in many ways, although the lazy hashing does mean that there are a few odd cases (ie something can be marked unhashed even though it was never on the hash in the first place, and isn't actually marked hashed!). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-23 05:37:40 +01:00			`set_index_entry(istate, nr, ce);`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`istate->cache_changed = 1;`
			`}`

git-mv: Keep moved index entries inact The rewrite of git-mv from a shell script to a builtin was perhaps a little too straightforward: the git add and git rm queues were emulated directly, which resulted in a rather complicated code and caused an inconsistent behaviour when moving dirty index entries; git mv would update the entry based on working tree state, except in case of overwrites, where the new entry would still have sha1 of the old file. This patch introduces rename_index_entry_at() into the index toolkit, which will rename an entry while removing any entries the new entry might render duplicate. This is then used in git mv instead of all the file queues, resulting in a major simplification of the code and an inevitable change in git mv -n output format. Also the code used to refuse renaming overwriting symlink with a regular file and vice versa; there is no need for that. A few new tests have been added to the testsuite to reflect this change. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-21 02:25:56 +02:00			`void rename_index_entry_at(struct index_state istate, int nr, const char new_name)`
			`{`
			`struct cache_entry old = istate->cache[nr], new;`
			`int namelen = strlen(new_name);`

			`new = xmalloc(cache_entry_size(namelen));`
			`copy_cache_entry(new, old);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`new->ce_flags &= ~CE_STATE_MASK;`
			`new->ce_namelen = namelen;`
git-mv: Keep moved index entries inact The rewrite of git-mv from a shell script to a builtin was perhaps a little too straightforward: the git add and git rm queues were emulated directly, which resulted in a rather complicated code and caused an inconsistent behaviour when moving dirty index entries; git mv would update the entry based on working tree state, except in case of overwrites, where the new entry would still have sha1 of the old file. This patch introduces rename_index_entry_at() into the index toolkit, which will rename an entry while removing any entries the new entry might render duplicate. This is then used in git mv instead of all the file queues, resulting in a major simplification of the code and an inevitable change in git mv -n output format. Also the code used to refuse renaming overwriting symlink with a regular file and vice versa; there is no need for that. A few new tests have been added to the testsuite to reflect this change. Signed-off-by: Petr Baudis <pasky@suse.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-21 02:25:56 +02:00			`memcpy(new->name, new_name, namelen + 1);`

			`cache_tree_invalidate_path(istate->cache_tree, old->name);`
			`remove_index_entry_at(istate, nr);`
			`add_index_entry(istate, new, ADD_CACHE_OK_TO_ADD\|ADD_CACHE_OK_TO_REPLACE);`
			`}`

Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`void fill_stat_data(struct stat_data sd, struct stat st)`
			`{`
			`sd->sd_ctime.sec = (unsigned int)st->st_ctime;`
			`sd->sd_mtime.sec = (unsigned int)st->st_mtime;`
			`sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);`
			`sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);`
			`sd->sd_dev = st->st_dev;`
			`sd->sd_ino = st->st_ino;`
			`sd->sd_uid = st->st_uid;`
			`sd->sd_gid = st->st_gid;`
			`sd->sd_size = st->st_size;`
			`}`

			`int match_stat_data(const struct stat_data sd, struct stat st)`
			`{`
			`int changed = 0;`

			`if (sd->sd_mtime.sec != (unsigned int)st->st_mtime)`
			`changed \|= MTIME_CHANGED;`
			`if (trust_ctime && check_stat &&`
			`sd->sd_ctime.sec != (unsigned int)st->st_ctime)`
			`changed \|= CTIME_CHANGED;`

			`#ifdef USE_NSEC`
			`if (check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st))`
			`changed \|= MTIME_CHANGED;`
			`if (trust_ctime && check_stat &&`
			`sd->sd_ctime.nsec != ST_CTIME_NSEC(*st))`
			`changed \|= CTIME_CHANGED;`
			`#endif`

			`if (check_stat) {`
			`if (sd->sd_uid != (unsigned int) st->st_uid \|\|`
			`sd->sd_gid != (unsigned int) st->st_gid)`
			`changed \|= OWNER_CHANGED;`
			`if (sd->sd_ino != (unsigned int) st->st_ino)`
			`changed \|= INODE_CHANGED;`
			`}`

			`#ifdef USE_STDEV`
			`/*`
			`* st_dev breaks on network filesystems where different`
			`* clients will have different views of what "device"`
			`* the filesystem is on`
			`*/`
			`if (check_stat && sd->sd_dev != (unsigned int) st->st_dev)`
			`changed \|= INODE_CHANGED;`
			`#endif`

			`if (sd->sd_size != (unsigned int) st->st_size)`
			`changed \|= DATA_CHANGED;`

			`return changed;`
			`}`

[PATCH] Implement git-checkout-cache -u to update stat information in the cache. With -u flag, git-checkout-cache picks up the stat information from newly created file and updates the cache. This removes the need to run git-update-cache --refresh immediately after running git-checkout-cache. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-15 23:23:12 +02:00			`/*`
			`* This only updates the "non-critical" parts of the directory`
			`* cache, ie the parts that aren't tracked by GIT, and only used`
			`* to validate the cache.`
			`*/`
			`void fill_stat_cache_info(struct cache_entry ce, struct stat st)`
			`{`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`fill_stat_data(&ce->ce_stat_data, st);`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00
			`if (assume_unchanged)`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`ce->ce_flags \|= CE_VALID;`
Avoid running lstat(2) on the same cache entry. Aside from the lstat(2) done for work tree files, there are quite many lstat(2) calls in refname dwimming codepath. This patch is not about reducing them. * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the cache entries that record a regular file blob that is up to date in the work tree. If somebody later walks the index and wants to see if the work tree has changes, they do not have to be checked with lstat(2) again. * fill_stat_cache_info() marks the cache entry it just added with CE_UPTODATE. This has the effect of marking the paths we write out of the index and lstat(2) immediately as "no need to lstat -- we know it is up-to-date", from quite a lot fo callers: - git-apply --index - git-update-index - git-checkout-index - git-add (uses add_file_to_index()) - git-commit (ditto) - git-mv (ditto) * refresh_cache_ent() also marks the cache entry that are clean with CE_UPTODATE. * write_index is changed not to write CE_UPTODATE out to the index file, because CE_UPTODATE is meant to be transient only in core. For the same reason, CE_UPDATE is not written to prevent an accident from happening. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:45:24 +01:00
			`if (S_ISREG(st->st_mode))`
			`ce_mark_uptodate(ce);`
[PATCH] Implement git-checkout-cache -u to update stat information in the cache. With -u flag, git-checkout-cache picks up the stat information from newly created file and updates the cache. This removes the need to run git-update-cache --refresh immediately after running git-checkout-cache. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-15 23:23:12 +02:00			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int ce_compare_data(const struct cache_entry ce, struct stat st)`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`{`
			`int match = -1;`
			`int fd = open(ce->name, O_RDONLY);`

			`if (fd >= 0) {`
			`unsigned char sha1[20];`
index_fd(): turn write_object and format_check arguments into one flag The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-08 10:47:33 +02:00			`if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`match = hashcmp(sha1, ce->sha1);`
Fix double "close()" in ce_compare_data Doing an "strace" on "git diff" shows that we close() a file descriptor twice (getting EBADFD on the second one) when we end up in ce_compare_data if the index does not match the checked-out stat information. The "index_fd()" function will already have closed the fd for us, so we should not close it again. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-31 18:55:15 +02:00			`/* index_fd() closed the file descriptor already */`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`}`
			`return match;`
			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int ce_compare_link(const struct cache_entry *ce, size_t expected_size)`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`{`
			`int match = -1;`
			`void *buffer;`
			`unsigned long size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-17 18:47:27 +01:00			`struct strbuf sb = STRBUF_INIT;`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00
Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-17 18:47:27 +01:00			`if (strbuf_readlink(&sb, ce->name, expected_size))`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`return -1;`
Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-17 18:47:27 +01:00
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`buffer = read_sha1_file(ce->sha1, &type, &size);`
Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-17 18:47:27 +01:00			`if (buffer) {`
			`if (size == sb.len)`
			`match = memcmp(buffer, sb.buf, size);`
			`free(buffer);`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`}`
Make 'ce_compare_link()' use the new 'strbuf_readlink()' This simplifies the code, and also makes ce_compare_link now able to handle filesystems with odd 'st_size' return values for symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-17 18:47:27 +01:00			`strbuf_release(&sb);`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`return match;`
			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int ce_compare_gitlink(const struct cache_entry *ce)`
Teach core object handling functions about gitlinks This teaches the really fundamental core SHA1 object handling routines about gitlinks. We can compare trees with gitlinks in them (although we can not actually generate patches for them yet - just raw git diffs), and they show up as commits in "git ls-tree". We also know to compare gitlinks as if they were directories (ie the normal "sort as trees" rules apply). [jc: amended a cut&paste error] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 06:20:29 +02:00			`{`
			`unsigned char sha1[20];`

			`/*`
			`* We don't actually require that the .git directory`
rename dirlink to gitlink. Unify naming of plumbing dirlink/gitlink concept: git ls-files -z '*.[ch]' \| xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;' Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-21 22:08:28 +02:00			`* under GITLINK directory be a valid git directory. It`
Teach core object handling functions about gitlinks This teaches the really fundamental core SHA1 object handling routines about gitlinks. We can compare trees with gitlinks in them (although we can not actually generate patches for them yet - just raw git diffs), and they show up as commits in "git ls-tree". We also know to compare gitlinks as if they were directories (ie the normal "sort as trees" rules apply). [jc: amended a cut&paste error] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 06:20:29 +02:00			`* might even be missing (in case nobody populated that`
			`* sub-project).`
			`*`
			`* If so, we consider it always to match.`
			`*/`
			`if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)`
			`return 0;`
			`return hashcmp(sha1, ce->sha1);`
			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int ce_modified_check_fs(const struct cache_entry ce, struct stat st)`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`{`
			`switch (st->st_mode & S_IFMT) {`
			`case S_IFREG:`
			`if (ce_compare_data(ce, st))`
			`return DATA_CHANGED;`
			`break;`
			`case S_IFLNK:`
Cast 64 bit off_t to 32 bit size_t Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4. This implies that we are able to access and work on files whose maximum length is around 2^63-1 bytes, but we can only malloc or mmap somewhat less than 2^32-1 bytes of memory. On such a system an implicit conversion of off_t to size_t can cause the size_t to wrap, resulting in unexpected and exciting behavior. Right now we are working around all gcc warnings generated by the -Wshorten-64-to-32 option by passing the off_t through xsize_t(). In the future we should make xsize_t on such problematic platforms detect the wrapping and die if such a file is accessed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:37 +01:00			`if (ce_compare_link(ce, xsize_t(st->st_size)))`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`return DATA_CHANGED;`
			`break;`
Fix gitlink index entry filesystem matching The code to match up index entries with the filesystem was stupidly broken. We shouldn't compare the filesystem stat() information with S_IFDIRLNK, since that's purely a git-internal value, and not what the filesystem uses (on the filesystem, it's just a regular directory). Also, don't bother to make the stat() time comparisons etc for DIRLNK entries in ce_match_stat_basic(), since we do an exact match for these things, and the hints in the stat data simply doesn't matter. This fixes "git status" with submodules that haven't been checked out in the supermodule. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:24:13 +02:00			`case S_IFDIR:`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (S_ISGITLINK(ce->ce_mode))`
Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-29 10:13:44 +02:00			`return ce_compare_gitlink(ce) ? DATA_CHANGED : 0;`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`default:`
			`return TYPE_CHANGED;`
			`}`
			`return 0;`
			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int ce_match_stat_basic(const struct cache_entry ce, struct stat st)`
Make the cache stat information comparator public. Like the cache filename finder, it's a generically useful function, rather than something specific to the current "show-diff" thing. 2005-04-09 18:48:20 +02:00			`{`
			`unsigned int changed = 0;`

Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (ce->ce_flags & CE_REMOVE)`
			`return MODE_CHANGED \| DATA_CHANGED \| TYPE_CHANGED;`

			`switch (ce->ce_mode & S_IFMT) {`
[PATCH] git and symlinks as tracked content Allow to store and track symlink in the repository. A symlink is stored the same way as a regular file, only with the appropriate mode bits set. The symlink target is therefore stored in a blob object. This will hopefully make our udev repository fully functional. :) Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-05 14:38:25 +02:00			`case S_IFREG:`
			`changed \|= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;`
Use core.filemode. With "[core] filemode = false", you can tell git to ignore differences in the working tree file only in executable bit. * "git-update-index --refresh" does not say "needs update" if index entry and working tree file differs only in executable bit. * "git-update-index" on an existing path takes executable bit from the existing index entry, if the path and index entry are both regular files. * "git-diff-files" and "git-diff-index" without --cached flag pretend the path on the filesystem has the same executable bit as the existing index entry, if the path and index entry are both regular files. If you are on a filesystem with unreliable mode bits, you may need to force the executable bit after registering the path in the index. * "git-update-index --chmod=+x foo" flips the executable bit of the index file entry for path "foo" on. Use "--chmod=-x" to flip it off. Note that --chmod only works in index file and does not look at nor update the working tree. So if you are on a filesystem and do not have working executable bit, you would do: 1. set the appropriate .git/config option; 2. "git-update-index --add new-file.c" 3. "git-ls-files --stage new-file.c" to see if it has the desired mode bits. If not, e.g. to drop executable bit picked up from the filesystem, say "git-update-index --chmod=-x new-file.c". Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 03:45:33 +02:00			`/* We consider only the owner x bit to be relevant for`
			`* "mode changes"`
			`*/`
			`if (trust_executable_bit &&`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`(0100 & (ce->ce_mode ^ st->st_mode)))`
[PATCH] fix compare symlink against readlink not data Fix update-cache to compare the blob of a symlink against the link-target and not the file it points to. Also ignore all permissions applied to links. Thanks to Greg for recognizing this while he added our list of symlinks back to the udev repository. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 15:45:01 +02:00			`changed \|= MODE_CHANGED;`
[PATCH] git and symlinks as tracked content Allow to store and track symlink in the repository. A symlink is stored the same way as a regular file, only with the appropriate mode bits set. The symlink target is therefore stored in a blob object. This will hopefully make our udev repository fully functional. :) Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-05 14:38:25 +02:00			`break;`
			`case S_IFLNK:`
Add core.symlinks to mark filesystems that do not support symbolic links. Some file systems that can host git repositories and their working copies do not support symbolic links. But then if the repository contains a symbolic link, it is impossible to check out the working copy. This patch enables partial support of symbolic links so that it is possible to check out a working copy on such a file system. A new flag core.symlinks (which is true by default) can be set to false to indicate that the filesystem does not support symbolic links. In this case, symbolic links that exist in the trees are checked out as small plain files, and checking in modifications of these files preserve the symlink property in the database (as long as an entry exists in the index). Of course, this does not magically make symbolic links work on such defective file systems; hence, this solution does not help if the working copy relies on that an entry is a real symbolic link. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-02 22:11:30 +01:00			`if (!S_ISLNK(st->st_mode) &&`
			`(has_symlinks \|\| !S_ISREG(st->st_mode)))`
			`changed \|= TYPE_CHANGED;`
[PATCH] git and symlinks as tracked content Allow to store and track symlink in the repository. A symlink is stored the same way as a regular file, only with the appropriate mode bits set. The symlink target is therefore stored in a blob object. This will hopefully make our udev repository fully functional. :) Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-05 14:38:25 +02:00			`break;`
rename dirlink to gitlink. Unify naming of plumbing dirlink/gitlink concept: git ls-files -z '*.[ch]' \| xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;' Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-21 22:08:28 +02:00			`case S_IFGITLINK:`
Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-29 10:13:44 +02:00			`/* We ignore most of the st_xxx fields for gitlinks */`
Teach core object handling functions about gitlinks This teaches the really fundamental core SHA1 object handling routines about gitlinks. We can compare trees with gitlinks in them (although we can not actually generate patches for them yet - just raw git diffs), and they show up as commits in "git ls-tree". We also know to compare gitlinks as if they were directories (ie the normal "sort as trees" rules apply). [jc: amended a cut&paste error] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 06:20:29 +02:00			`if (!S_ISDIR(st->st_mode))`
			`changed \|= TYPE_CHANGED;`
			`else if (ce_compare_gitlink(ce))`
			`changed \|= DATA_CHANGED;`
Fix gitlink index entry filesystem matching The code to match up index entries with the filesystem was stupidly broken. We shouldn't compare the filesystem stat() information with S_IFDIRLNK, since that's purely a git-internal value, and not what the filesystem uses (on the filesystem, it's just a regular directory). Also, don't bother to make the stat() time comparisons etc for DIRLNK entries in ce_match_stat_basic(), since we do an exact match for these things, and the hints in the stat data simply doesn't matter. This fixes "git status" with submodules that haven't been checked out in the supermodule. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 18:24:13 +02:00			`return changed;`
[PATCH] git and symlinks as tracked content Allow to store and track symlink in the repository. A symlink is stored the same way as a regular file, only with the appropriate mode bits set. The symlink target is therefore stored in a blob object. This will hopefully make our udev repository fully functional. :) Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-05 14:38:25 +02:00			`default:`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`die("internal error: ce_mode is %o", ce->ce_mode);`
[PATCH] git and symlinks as tracked content Allow to store and track symlink in the repository. A symlink is stored the same way as a regular file, only with the appropriate mode bits set. The symlink target is therefore stored in a blob object. This will hopefully make our udev repository fully functional. :) Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-05 14:38:25 +02:00			`}`
Don't care about st_dev in the index file Thomas Glanzmann points out that it doesn't work well with different clients accessing the repository over NFS - they have different views on what the "device" for the filesystem is. Of course, other filesystems may not even have stable inode numbers. But we don't care. At least for now. 2005-05-23 00:08:15 +02:00
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`changed \|= match_stat_data(&ce->ce_stat_data, st);`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00
racy-git: an empty blob has a fixed object name We use size=0 as the magic token to say the entry is known to be racily clean, but a sequence that does: - update the path with a non-empty blob and write the index; - update an unrelated path and write the index -- this smudges the above entry; - truncate the path to size zero. would make both the size field for the path in the index and the size on the filesystem zero. We should not mistake it as a clean index entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-10 19:44:43 +02:00			`/* Racily smudged entry? */`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`if (!ce->ce_stat_data.sd_size) {`
racy-git: an empty blob has a fixed object name We use size=0 as the magic token to say the entry is known to be racily clean, but a sequence that does: - update the path with a non-empty blob and write the index; - update an unrelated path and write the index -- this smudges the above entry; - truncate the path to size zero. would make both the size field for the path in the index and the size on the filesystem zero. We should not mistake it as a clean index entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-10 19:44:43 +02:00			`if (!is_empty_blob_sha1(ce->sha1))`
			`changed \|= DATA_CHANGED;`
			`}`

Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`return changed;`
			`}`

read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`static int is_racy_timestamp(const struct index_state *istate,`
			`const struct cache_entry *ce)`
read-cache.c: introduce is_racy_timestamp() helper This moves a common boolean expression into a helper function, and makes the comparison between filesystem timestamp and index timestamp done in the function in line with the other places. st.st_mtime should be casted to (unsigned int) when compared to an index timestamp ce_mtime. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-21 09:44:50 +01:00			`{`
is_racy_timestamp(): do not check timestamp for gitlinks Because we do not even check the timestamp to determie if a gitlink is up to date or not, triggering the racy-timestamp check for gitlinks does not make sense. This fixes the recently added test in t7506. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-04 02:24:28 +02:00			`return (!S_ISGITLINK(ce->ce_mode) &&`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`istate->timestamp.sec &&`
			`#ifdef USE_NSEC`
			`/* nanosecond timestamped files can also be racy! */`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`(istate->timestamp.sec < ce->ce_stat_data.sd_mtime.sec \|\|`
			`(istate->timestamp.sec == ce->ce_stat_data.sd_mtime.sec &&`
			`istate->timestamp.nsec <= ce->ce_stat_data.sd_mtime.nsec))`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`#else`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`istate->timestamp.sec <= ce->ce_stat_data.sd_mtime.sec`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`#endif`
			`);`
read-cache.c: introduce is_racy_timestamp() helper This moves a common boolean expression into a helper function, and makes the comparison between filesystem timestamp and index timestamp done in the function in line with the other places. st.st_mtime should be casted to (unsigned int) when compared to an index timestamp ce_mtime. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-21 09:44:50 +01:00			`}`

Add 'const' where appropriate to index handling functions This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 21:46:09 +01:00			`int ie_match_stat(const struct index_state *istate,`
read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`const struct cache_entry ce, struct stat st,`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`unsigned int options)`
Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`{`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00			`unsigned int changed;`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`int ignore_valid = options & CE_MATCH_IGNORE_VALID;`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`int ignore_skip_worktree = options & CE_MATCH_IGNORE_SKIP_WORKTREE;`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`int assume_racy_is_modified = options & CE_MATCH_RACY_IS_DIRTY;`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00
			`/*`
			`* If it's marked as always valid in the index, it's`
			`* valid whatever the checked-out copy says.`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`*`
			`* skip-worktree has the same effect with higher precedence`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00			`*/`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`if (!ignore_skip_worktree && ce_skip_worktree(ce))`
			`return 0;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (!ignore_valid && (ce->ce_flags & CE_VALID))`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00			`return 0;`

git add --intent-to-add: do not let an empty blob be committed by accident Writing a tree out of an index with an "intent to add" entry is blocked. This implies that you cannot "git commit" from such a state; however you can still do "git commit -a" or "git commit $that_path". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-11-29 04:56:34 +01:00			`/*`
			`* Intent-to-add entries have not been added, so the index entry`
			`* by definition never matches what is in the work tree until it`
			`* actually gets added.`
			`*/`
			`if (ce->ce_flags & CE_INTENT_TO_ADD)`
			`return DATA_CHANGED \| TYPE_CHANGED \| MODE_CHANGED;`

"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00			`changed = ce_match_stat_basic(ce, st);`
Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`/*`
			`* Within 1 second of this sequence:`
			`* echo xyzzy >file && git-update-index --add file`
			`* running this command:`
			`* echo frotz >file`
			`* would give a falsely clean cache entry. The mtime and`
			`* length match the cache, and other stat fields do not change.`
			`*`
			`* We could detect this at update-index time (the cache entry`
			`* being registered/updated records the same time as "now")`
			`* and delay the return from git-update-index, but that would`
			`* effectively mean we can make at most one commit per second,`
			`* which is not acceptable. Instead, we check cache entries`
			`* whose mtime are the same as the index file timestamp more`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00			`* carefully than others.`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`*/`
read-cache.c: introduce is_racy_timestamp() helper This moves a common boolean expression into a helper function, and makes the comparison between filesystem timestamp and index timestamp done in the function in line with the other places. st.st_mtime should be casted to (unsigned int) when compared to an index timestamp ce_mtime. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-21 09:44:50 +01:00			`if (!changed && is_racy_timestamp(istate, ce)) {`
Add check program "git-check-racy" This will help counting the racily clean paths, but it should be useless for daily use. Do not even enable it in the makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-16 06:38:07 +02:00			`if (assume_racy_is_modified)`
			`changed \|= DATA_CHANGED;`
			`else`
			`changed \|= ce_modified_check_fs(ce, st);`
			`}`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`return changed;`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`}`

Add 'const' where appropriate to index handling functions This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 21:46:09 +01:00			`int ie_modified(const struct index_state *istate,`
read-cache: mark cache_entry pointers const ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-02 17:46:52 +02:00			`const struct cache_entry *ce,`
			`struct stat *st, unsigned int options)`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`{`
Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`int changed, changed_fs;`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00
			`changed = ie_match_stat(istate, ce, st, options);`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`if (!changed)`
			`return 0;`
			`/*`
			`* If the mode or type has changed, there's no point in trying`
			`* to refresh the entry - it's not going to match`
			`*/`
			`if (changed & (MODE_CHANGED \| TYPE_CHANGED))`
			`return changed;`

Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-29 10:13:44 +02:00			`/*`
			`* Immediately after read-tree or update-index --cacheinfo,`
			`* the length field is zero, as we have never even read the`
			`* lstat(2) information once, and we cannot trust DATA_CHANGED`
			`* returned by ie_match_stat() which in turn was returned by`
			`* ce_match_stat_basic() to signal that the filesize of the`
			`* blob changed. We have to actually go to the filesystem to`
			`* see if the contents match, and if so, should answer "unchanged".`
			`*`
			`* The logic does not apply to gitlinks, as ce_match_stat_basic()`
			`* already has checked the actual HEAD from the filesystem in the`
			`* subproject. If ie_match_stat() already said it is different,`
			`* then we know it is.`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`*/`
Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-29 10:13:44 +02:00			`if ((changed & DATA_CHANGED) &&`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`(S_ISGITLINK(ce->ce_mode) \|\| ce->ce_stat_data.sd_size != 0))`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`return changed;`

Racy GIT This fixes the longstanding "Racy GIT" problem, which was pretty much there from the beginning of time, but was first demonstrated by Pasky in this message on October 24, 2005: http://marc.theaimsgroup.com/?l=git&m=113014629716878 If you run the following sequence of commands: echo frotz >infocom git update-index --add infocom echo xyzzy >infocom so that the second update to file "infocom" does not change st_mtime, what is recorded as the stat information for the cache entry "infocom" exactly matches what is on the filesystem (owner, group, inum, mtime, ctime, mode, length). After this sequence, we incorrectly think "infocom" file still has string "frotz" in it, and get really confused. E.g. git-diff-files would say there is no change, git-update-index --refresh would not even look at the filesystem to correct the situation. Some ways of working around this issue were already suggested by Linus in the same thread on the same day, including waiting until the next second before returning from update-index if a cache entry written out has the current timestamp, but that means we can make at most one commit per second, and given that the e-mail patch workflow used by Linus needs to process at least 5 commits per second, it is not an acceptable solution. Linus notes that git-apply is primarily used to update the index while processing e-mailed patches, which is true, and git-apply's up-to-date check is fooled by the same problem but luckily in the other direction, so it is not really a big issue, but still it is disturbing. The function ce_match_stat() is called to bypass the comparison against filesystem data when the stat data recorded in the cache entry matches what stat() returns from the filesystem. This patch tackles the problem by changing it to actually go to the filesystem data for cache entries that have the same mtime as the index file itself. This works as long as the index file and working tree files are on the filesystems that share the same monotonic clock. Files on network mounted filesystems sometimes get skewed timestamps compared to "date" output, but as long as working tree files' timestamps are skewed the same way as the index file's, this approach still works. The only problematic files are the ones that have the same timestamp as the index file's, because two file updates that sandwitch the index file update must happen within the same second to trigger the problem. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 09:02:15 +01:00			`changed_fs = ce_modified_check_fs(ce, st);`
			`if (changed_fs)`
			`return changed \| changed_fs;`
Show modified files in git-ls-files Add -m/--modified to show files that have been modified wrt. the index. [jc: The original came from Brian Gerst on Sep 1st but it only checked if the paths were cache dirty without actually checking the files were modified. I also added the usage string and a new test.] Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-20 00:11:15 +02:00			`return 0;`
			`}`

Introduce "base_name_compare()" helper function This one compares two pathnames that may be partial basenames, not full paths. We need to get the path sorting right, since a directory name will sort as if it had the final '/' at the end. 2005-05-20 18:09:18 +02:00			`int base_name_compare(const char *name1, int len1, int mode1,`
			`const char *name2, int len2, int mode2)`
			`{`
			`unsigned char c1, c2;`
			`int len = len1 < len2 ? len1 : len2;`
			`int cmp;`

			`cmp = memcmp(name1, name2, len);`
			`if (cmp)`
			`return cmp;`
			`c1 = name1[len];`
			`c2 = name2[len];`
Fix thinko in subproject entry sorting This fixes a total thinko in my original series: subprojects do not sort like directories, because the index is sorted purely by full pathname, and since a subproject shows up in the index as a normal NUL-terminated string, it never has the issues with sorting with the '/' at the end. So if you have a subproject "proj" and a file "proj.c", the subproject sorts alphabetically before the file in the index (and must thus also sort that way in a tree object, since trees sort as the index). In contrast, it you have two files "proj/file" and "proj.c", the "proj.c" will sort alphabetically before "proj/file" in the index. The index itself, of course, does not actually contain an entry "proj/", but in the tree that gets written out, the tree entry "proj" will sort after the file entry "proj.c", which is the only real magic sorting rule. In other words: the magic sorting rule only affects tree entries, and only affects tree entries that point to other trees (ie are of the type S_IFDIR). Anyway, that thinko just means that we should remove the special case to make S_ISDIRLNK entries sort like S_ISDIR entries. They don't. They sort like normal files. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-11 23:39:12 +02:00			`if (!c1 && S_ISDIR(mode1))`
Introduce "base_name_compare()" helper function This one compares two pathnames that may be partial basenames, not full paths. We need to get the path sorting right, since a directory name will sort as if it had the final '/' at the end. 2005-05-20 18:09:18 +02:00			`c1 = '/';`
Fix thinko in subproject entry sorting This fixes a total thinko in my original series: subprojects do not sort like directories, because the index is sorted purely by full pathname, and since a subproject shows up in the index as a normal NUL-terminated string, it never has the issues with sorting with the '/' at the end. So if you have a subproject "proj" and a file "proj.c", the subproject sorts alphabetically before the file in the index (and must thus also sort that way in a tree object, since trees sort as the index). In contrast, it you have two files "proj/file" and "proj.c", the "proj.c" will sort alphabetically before "proj/file" in the index. The index itself, of course, does not actually contain an entry "proj/", but in the tree that gets written out, the tree entry "proj" will sort after the file entry "proj.c", which is the only real magic sorting rule. In other words: the magic sorting rule only affects tree entries, and only affects tree entries that point to other trees (ie are of the type S_IFDIR). Anyway, that thinko just means that we should remove the special case to make S_ISDIRLNK entries sort like S_ISDIR entries. They don't. They sort like normal files. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-11 23:39:12 +02:00			`if (!c2 && S_ISDIR(mode2))`
Introduce "base_name_compare()" helper function This one compares two pathnames that may be partial basenames, not full paths. We need to get the path sorting right, since a directory name will sort as if it had the final '/' at the end. 2005-05-20 18:09:18 +02:00			`c2 = '/';`
			`return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;`
			`}`

Add 'df_name_compare()' helper function This new helper is identical to base_name_compare(), except it compares conflicting directory/file entries as equal in order to help handling DF conflicts (thus the name). Note that while a directory name compares as equal to a regular file with the new helper, they then individually compare _differently_ to a filename that has a dot after the basename (because '\0' < '.' < '/'). So a directory called "foo/" will compare equal to a file "foo", even though "foo.c" will compare after "foo" and before "foo/" This will be used by routines that want to traverse the git namespace but then handle conflicting entries together when possible. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 03:25:10 +01:00			`/*`
			`* df_name_compare() is identical to base_name_compare(), except it`
			`* compares conflicting directory/file entries as equal. Note that`
			`* while a directory name compares as equal to a regular file, they`
			`* then individually compare _differently_ to a filename that has`
			`* a dot after the basename (because '\0' < '.' < '/').`
			`*`
			`* This is used by routines that want to traverse the git namespace`
			`* but then handle conflicting entries together when possible.`
			`*/`
			`int df_name_compare(const char *name1, int len1, int mode1,`
			`const char *name2, int len2, int mode2)`
			`{`
			`int len = len1 < len2 ? len1 : len2, cmp;`
			`unsigned char c1, c2;`

			`cmp = memcmp(name1, name2, len);`
			`if (cmp)`
			`return cmp;`
			`/* Directories and files compare equal (same length, same name) */`
			`if (len1 == len2)`
			`return 0;`
			`c1 = name1[len];`
			`if (!c1 && S_ISDIR(mode1))`
			`c1 = '/';`
			`c2 = name2[len];`
			`if (!c2 && S_ISDIR(mode2))`
			`c2 = '/';`
			`if (c1 == '/' && !c2)`
			`return 0;`
			`if (c2 == '/' && !c1)`
			`return 0;`
			`return c1 - c2;`
			`}`

Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`int cache_name_stage_compare(const char name1, int len1, int stage1, const char name2, int len2, int stage2)`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`{`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`int len = len1 < len2 ? len1 : len2;`
			`int cmp;`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00
			`cmp = memcmp(name1, name2, len);`
			`if (cmp)`
			`return cmp;`
			`if (len1 < len2)`
			`return -1;`
			`if (len1 > len2)`
			`return 1;`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`if (stage1 < stage2)`
Make cache entry comparison take the new "state" flag into account. This is what allows us to have multiple states of the same file in the index, and what makes it always sort correctly. 2005-04-16 07:51:44 +02:00			`return -1;`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`if (stage1 > stage2)`
Make cache entry comparison take the new "state" flag into account. This is what allows us to have multiple states of the same file in the index, and what makes it always sort correctly. 2005-04-16 07:51:44 +02:00			`return 1;`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`return 0;`
			`}`

Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`int cache_name_compare(const char name1, int len1, const char name2, int len2)`
			`{`
			`return cache_name_stage_compare(name1, len1, 0, name2, len2, 0);`
			`}`

read-cache.c: mark a private file-scope symbol as static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-16 07:44:31 +02:00			`static int index_name_stage_pos(const struct index_state istate, const char name, int namelen, int stage)`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`{`
			`int first, last;`

			`first = 0;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`last = istate->cache_nr;`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`while (last > first) {`
			`int next = (last + first) >> 1;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`struct cache_entry *ce = istate->cache[next];`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`int cmp = cache_name_stage_compare(name, namelen, stage, ce->name, ce_namelen(ce), ce_stage(ce));`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`if (!cmp)`
Fix off-by-one error in removal of cache entry. Also make the return value of "cache_name_pos()" be sane: positive or zero if we found it (it's the index into the cache array), and "-pos-1" to indicate where it should go if we didn't. 2005-04-11 07:06:50 +02:00			`return next;`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`if (cmp < 0) {`
			`last = next;`
			`continue;`
			`}`
			`first = next+1;`
			`}`
Fix off-by-one error in removal of cache entry. Also make the return value of "cache_name_pos()" be sane: positive or zero if we found it (it's the index into the cache array), and "-pos-1" to indicate where it should go if we didn't. 2005-04-11 07:06:50 +02:00			`return -first-1;`
Make "cache_name_pos()" available to others. It finds the cache entry position for a given name, and is generally useful. Sure, everybody can just scan the active cache array, but since it's sorted, you actually want to search it with a binary search, so let's not duplicate that logic all over the place. 2005-04-09 18:26:55 +02:00			`}`

Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`int index_name_pos(const struct index_state istate, const char name, int namelen)`
			`{`
			`return index_name_stage_pos(istate, name, namelen, 0);`
			`}`

When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`/* Remove entry, return true if there are more entries to go.. */`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int remove_index_entry_at(struct index_state *istate, int pos)`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`{`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`struct cache_entry *ce = istate->cache[pos];`

resolve-undo: record resolved conflicts in a new index extension section When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-25 09:30:51 +01:00			`record_resolve_undo(istate, ce);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`remove_name_hash(istate, ce);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_changed = 1;`
			`istate->cache_nr--;`
			`if (pos >= istate->cache_nr)`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`return 0;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`memmove(istate->cache + pos,`
			`istate->cache + pos + 1,`
			`(istate->cache_nr - pos) * sizeof(struct cache_entry *));`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`return 1;`
			`}`

check_updates(): effective removal of cache entries marked CE_REMOVE Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25' (move from tag v2.6.27 to tag v2.6.25 of the Linux kernel): CPU: Core 2, speed 1999.95 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 20000 Counted INST_RETIRED_ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 20000 CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 409247 100.000 342878 100.000 git CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 260476 63.6476 257843 75.1996 libz.so.1.2.3 100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux 30850 7.5382 7874 2.2964 libc-2.9.so 14775 3.6103 8390 2.4469 git 2020 0.4936 4325 1.2614 libcrypto.so.0.9.8 191 0.0467 32 0.0093 libpthread-2.9.so 58 0.0142 36 0.0105 ld-2.9.so 1 2.4e-04 0 0 libldap-2.3.so.0.2.31 Detail list of the top 20 function entries (libz counted in one blob): CPU_CLK_UNHALTED INST_RETIRED_ANY_P samples % samples % image name symbol name 260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3 16587 4.0555 3636 1.0615 libc-2.9.so memcpy 7710 1.8851 277 0.0809 libc-2.9.so memmove 3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate 3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk 3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc 2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user 2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk 2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty 2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit 2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access 2070 0.5061 514 0.1501 git cache_name_compare 2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit 2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc 2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 1965 0.4804 1384 0.4040 git patch_delta 1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period 1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias 1659 0.4056 290 0.0847 git find_pack_entry_one 1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles per instruction, and compared to the total cycles spent inside the source code of GIT for this command, all the memmove() calls translates to (7710 * 100) / 14775 = 52.2% of this. Retesting with a GIT program compiled for gcov usage, I found out that the memmove() calls came from remove_index_entry_at() in read-cache.c, where we have: memmove(istate->cache + pos, istate->cache + pos + 1, (istate->cache_nr - pos) * sizeof(struct cache_entry )); remove_index_entry_at() is called 4902 times from check_updates() in unpack-trees.c, and each time called we move each cache_entry pointers (from the removed one) one step to the left. Since we have 28828 entries in the cache this time, and if we on average move half of them each time, we in total move approximately 4902 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if each pointer is 8 bytes (64 bit). OK, is seems that the function check_updates() is called 28 times, so the estimated guess above had been more correct if check_updates() had been called only once, but the point is: we get lots of bytes moved. To fix this, and use an O(N) algorithm instead, where N is the number of cache_entries, we delete/remove all entries in one loop through all entries. From a retest, the new remove_marked_cache_entries() from the patch below, ended up with the following output line from oprofile: 46 0.0105 15 0.0041 git remove_marked_cache_entries If we can trust the numbers from oprofile in this case, we saved approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077 seconds CPU time with this fix for this particular test. And notice that now the CPU did only 46 / 15 = 3.1 cycles/instruction. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-18 23:18:03 +01:00			`/*`
many small typofixes Signed-off-by: Ondřej Bílka <neleai@seznam.cz> Reviewed-by: Marc Branchaud <marcnarc@xiplink.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-29 10:18:21 +02:00			`* Remove all cache entries marked for removal, that is where`
check_updates(): effective removal of cache entries marked CE_REMOVE Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25' (move from tag v2.6.27 to tag v2.6.25 of the Linux kernel): CPU: Core 2, speed 1999.95 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 20000 Counted INST_RETIRED_ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 20000 CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 409247 100.000 342878 100.000 git CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 260476 63.6476 257843 75.1996 libz.so.1.2.3 100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux 30850 7.5382 7874 2.2964 libc-2.9.so 14775 3.6103 8390 2.4469 git 2020 0.4936 4325 1.2614 libcrypto.so.0.9.8 191 0.0467 32 0.0093 libpthread-2.9.so 58 0.0142 36 0.0105 ld-2.9.so 1 2.4e-04 0 0 libldap-2.3.so.0.2.31 Detail list of the top 20 function entries (libz counted in one blob): CPU_CLK_UNHALTED INST_RETIRED_ANY_P samples % samples % image name symbol name 260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3 16587 4.0555 3636 1.0615 libc-2.9.so memcpy 7710 1.8851 277 0.0809 libc-2.9.so memmove 3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate 3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk 3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc 2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user 2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk 2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty 2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit 2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access 2070 0.5061 514 0.1501 git cache_name_compare 2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit 2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc 2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 1965 0.4804 1384 0.4040 git patch_delta 1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period 1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias 1659 0.4056 290 0.0847 git find_pack_entry_one 1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles per instruction, and compared to the total cycles spent inside the source code of GIT for this command, all the memmove() calls translates to (7710 * 100) / 14775 = 52.2% of this. Retesting with a GIT program compiled for gcov usage, I found out that the memmove() calls came from remove_index_entry_at() in read-cache.c, where we have: memmove(istate->cache + pos, istate->cache + pos + 1, (istate->cache_nr - pos) * sizeof(struct cache_entry )); remove_index_entry_at() is called 4902 times from check_updates() in unpack-trees.c, and each time called we move each cache_entry pointers (from the removed one) one step to the left. Since we have 28828 entries in the cache this time, and if we on average move half of them each time, we in total move approximately 4902 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if each pointer is 8 bytes (64 bit). OK, is seems that the function check_updates() is called 28 times, so the estimated guess above had been more correct if check_updates() had been called only once, but the point is: we get lots of bytes moved. To fix this, and use an O(N) algorithm instead, where N is the number of cache_entries, we delete/remove all entries in one loop through all entries. From a retest, the new remove_marked_cache_entries() from the patch below, ended up with the following output line from oprofile: 46 0.0105 15 0.0041 git remove_marked_cache_entries If we can trust the numbers from oprofile in this case, we saved approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077 seconds CPU time with this fix for this particular test. And notice that now the CPU did only 46 / 15 = 3.1 cycles/instruction. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-18 23:18:03 +01:00			`* CE_REMOVE is set in ce_flags. This is much more effective than`
			`* calling remove_index_entry_at() for each entry to be removed.`
			`*/`
			`void remove_marked_cache_entries(struct index_state *istate)`
			`{`
			`struct cache_entry **ce_array = istate->cache;`
			`unsigned int i, j;`

			`for (i = j = 0; i < istate->cache_nr; i++) {`
			`if (ce_array[i]->ce_flags & CE_REMOVE)`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`remove_name_hash(istate, ce_array[i]);`
check_updates(): effective removal of cache entries marked CE_REMOVE Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25' (move from tag v2.6.27 to tag v2.6.25 of the Linux kernel): CPU: Core 2, speed 1999.95 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 20000 Counted INST_RETIRED_ANY_P events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 20000 CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 409247 100.000 342878 100.000 git CPU_CLK_UNHALT...\|INST_RETIRED:2...\| samples\| %\| samples\| %\| ------------------------------------ 260476 63.6476 257843 75.1996 libz.so.1.2.3 100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux 30850 7.5382 7874 2.2964 libc-2.9.so 14775 3.6103 8390 2.4469 git 2020 0.4936 4325 1.2614 libcrypto.so.0.9.8 191 0.0467 32 0.0093 libpthread-2.9.so 58 0.0142 36 0.0105 ld-2.9.so 1 2.4e-04 0 0 libldap-2.3.so.0.2.31 Detail list of the top 20 function entries (libz counted in one blob): CPU_CLK_UNHALTED INST_RETIRED_ANY_P samples % samples % image name symbol name 260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3 16587 4.0555 3636 1.0615 libc-2.9.so memcpy 7710 1.8851 277 0.0809 libc-2.9.so memmove 3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate 3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk 3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc 2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user 2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk 2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty 2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit 2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access 2070 0.5061 514 0.1501 git cache_name_compare 2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit 2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc 2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8 1965 0.4804 1384 0.4040 git patch_delta 1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period 1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias 1659 0.4056 290 0.0847 git find_pack_entry_one 1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles per instruction, and compared to the total cycles spent inside the source code of GIT for this command, all the memmove() calls translates to (7710 * 100) / 14775 = 52.2% of this. Retesting with a GIT program compiled for gcov usage, I found out that the memmove() calls came from remove_index_entry_at() in read-cache.c, where we have: memmove(istate->cache + pos, istate->cache + pos + 1, (istate->cache_nr - pos) * sizeof(struct cache_entry )); remove_index_entry_at() is called 4902 times from check_updates() in unpack-trees.c, and each time called we move each cache_entry pointers (from the removed one) one step to the left. Since we have 28828 entries in the cache this time, and if we on average move half of them each time, we in total move approximately 4902 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if each pointer is 8 bytes (64 bit). OK, is seems that the function check_updates() is called 28 times, so the estimated guess above had been more correct if check_updates() had been called only once, but the point is: we get lots of bytes moved. To fix this, and use an O(N) algorithm instead, where N is the number of cache_entries, we delete/remove all entries in one loop through all entries. From a retest, the new remove_marked_cache_entries() from the patch below, ended up with the following output line from oprofile: 46 0.0105 15 0.0041 git remove_marked_cache_entries If we can trust the numbers from oprofile in this case, we saved approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077 seconds CPU time with this fix for this particular test. And notice that now the CPU did only 46 / 15 = 3.1 cycles/instruction. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-18 23:18:03 +01:00			`else`
			`ce_array[j++] = ce_array[i];`
			`}`
			`istate->cache_changed = 1;`
			`istate->cache_nr = j;`
			`}`

Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int remove_file_from_index(struct index_state istate, const char path)`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`{`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int pos = index_name_pos(istate, path, strlen(path));`
[PATCH] update-cache --remove marks the path merged. When update-cache --remove is run, resolve unmerged state for the path. This is consistent with the update-cache --add behaviour. Essentially, the user is telling us how he wants to resolve the merge by running update-cache. Signed-off-by: Junio C Hamano <junkio@cox.net> Fixed to do the right thing at the end. Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-17 18:53:35 +02:00			`if (pos < 0)`
			`pos = -pos-1;`
Simplify cache API Earlier, add_file_to_index() invalidated the path in the cache-tree but remove_file_from_cache() did not, and the user of the latter needed to invalidate the entry himself. This led to a few bugs due to missed invalidate calls already. This patch makes the management of cache-tree less error prone by making more invalidate calls from lower level cache API functions. The rules are: - If you are going to write the index, you should either maintain cache_tree correctly. - If you cannot, alternatively you can remove the entire cache_tree by calling cache_tree_free() before you call write_cache(). - When you modify the index, cache_tree_invalidate_path() should be called with the path you are modifying, to discard the entry from the cache-tree structure. - The following cache API functions exported from read-cache.c (and the macro whose names have "cache" instead of "index") automatically call cache_tree_invalidate_path() for you: - remove_file_from_index(); - add_file_to_index(); - add_index_entry(); You can modify the index bypassing the above API functions (e.g. find an existing cache entry from the index and modify it in place). You need to call cache_tree_invalidate_path() yourself in such a case. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-14 05:33:11 +02:00			`cache_tree_invalidate_path(istate->cache_tree, path);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`while (pos < istate->cache_nr && !strcmp(istate->cache[pos]->name, path))`
			`remove_index_entry_at(istate, pos);`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`return 0;`
			`}`

git add: respect core.filemode with unmerged entries When a merge left unmerged entries, git add failed to pick up the file mode from the index, when core.filemode == 0. If more than one unmerged entry is there, the order of stage preference is 2, 1, 3. Noticed by Johannes Sixt. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-29 19:32:46 +02:00			`static int compare_name(struct cache_entry ce, const char path, int namelen)`
			`{`
			`return namelen != ce_namelen(ce) \|\| memcmp(path, ce->name, namelen);`
			`}`

			`static int index_name_pos_also_unmerged(struct index_state *istate,`
			`const char *path, int namelen)`
			`{`
			`int pos = index_name_pos(istate, path, namelen);`
			`struct cache_entry *ce;`

			`if (pos >= 0)`
			`return pos;`

			`/* maybe unmerged? */`
			`pos = -1 - pos;`
			`if (pos >= istate->cache_nr \|\|`
			`compare_name((ce = istate->cache[pos]), path, namelen))`
			`return -1;`

			`/* order of preference: stage 2, 1, 3 */`
			`if (ce_stage(ce) == 1 && pos + 1 < istate->cache_nr &&`
			`ce_stage((ce = istate->cache[pos + 1])) == 2 &&`
			`!compare_name(ce, path, namelen))`
			`pos++;`
			`return pos;`
			`}`

Make git-add behave more sensibly in a case-insensitive environment This expands on the previous patch, and allows "git add" to sanely handle a filename that has changed case, keeping the case in the index constant, and avoiding aliases. In particular, if you have an index entry called "File", but the checked-out tree is case-corrupted and has an entry called "file" instead, doing a git add . (or naming "file" explicitly) will automatically notice that we have an alias, and will replace the name "file" with the existing index capitalization (ie "File"). However, if we actually have both a file called "File" and one called "file", and they don't have the same lstat() information (ie we're on a case-sensitive filesystem but have the "core.ignorecase" flag set), we will error out if we try to add them both. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-22 22:22:44 +01:00			`static int different_name(struct cache_entry ce, struct cache_entry alias)`
			`{`
			`int len = ce_namelen(ce);`
			`return ce_namelen(alias) != len \|\| memcmp(ce->name, alias->name, len);`
			`}`

			`/*`
			`* If we add a filename that aliases in the cache, we will use the`
			`* name that we already have - but we don't want to update the same`
			`* alias twice, because that implies that there were actually two`
			`* different files with aliasing names!`
			`*`
			`* So we use the CE_ADDED flag to verify that the alias was an old`
			`* one before we accept it as`
			`*/`
			`static struct cache_entry create_alias_ce(struct cache_entry ce, struct cache_entry *alias)`
			`{`
			`int len;`
			`struct cache_entry *new;`

			`if (alias->ce_flags & CE_ADDED)`
			`die("Will not add file alias '%s' ('%s' already exists in index)", ce->name, alias->name);`

			`/* Ok, create the new entry using the name of the existing alias */`
			`len = ce_namelen(alias);`
			`new = xcalloc(1, cache_entry_size(len));`
			`memcpy(new->name, alias->name, len);`
			`copy_cache_entry(new, ce);`
			`free(ce);`
			`return new;`
			`}`

git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`static void record_intent_to_add(struct cache_entry *ce)`
			`{`
			`unsigned char sha1[20];`
			`if (write_sha1_file("", 0, blob_type, sha1))`
			`die("cannot create an empty blob in the object database");`
			`hashcpy(ce->sha1, sha1);`
			`}`

"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`int add_to_index(struct index_state istate, const char path, struct stat *st, int flags)`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00			`{`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`int size, namelen, was_same;`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`mode_t st_mode = st->st_mode;`
When adding files to the index, add support for case-independent matches This simplifies the matching case of "I already have this file and it is up-to-date" and makes it do the right thing in the face of case-insensitive aliases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-22 21:19:49 +01:00			`struct cache_entry ce, alias;`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`unsigned ce_option = CE_MATCH_IGNORE_VALID\|CE_MATCH_IGNORE_SKIP_WORKTREE\|CE_MATCH_RACY_IS_DIRTY;`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`int verbose = flags & (ADD_CACHE_VERBOSE \| ADD_CACHE_PRETEND);`
			`int pretend = flags & ADD_CACHE_PRETEND;`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`int intent_only = flags & ADD_CACHE_INTENT;`
			`int add_option = (ADD_CACHE_OK_TO_ADD\|ADD_CACHE_OK_TO_REPLACE\|`
			`(intent_only ? ADD_CACHE_NEW_ONLY : 0));`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))`
Make the exit code of add_file_to_index actually useful Update the programs which used the function (as add_file_to_cache). Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-12 19:57:45 +02:00			`return error("%s: can only add regular files, symbolic links or git-directories", path);`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00
			`namelen = strlen(path);`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`if (S_ISDIR(st_mode)) {`
Teach directory traversal about subprojects This is the promised cleaned-up version of teaching directory traversal (ie the "read_directory()" logic) about subprojects. That makes "git add" understand to add/update subprojects. It now knows to look at the index file to see if a directory is marked as a subproject, and use that as information as whether it should be recursed into or not. It also generally cleans up the handling of directory entries when traversing the working tree, by splitting up the decision-making process into small functions of their own, and adding a fair number of comments. Finally, it teaches "add_file_to_cache()" that directory names can have slashes at the end, since the directory traversal adds them to make the difference between a file and a directory clear (it always did that, but my previous too-ugly-to-apply subproject patch had a totally different path for subproject directories and avoided the slash for that case). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-11 23:49:44 +02:00			`while (namelen && path[namelen-1] == '/')`
			`namelen--;`
			`}`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00			`size = cache_entry_size(namelen);`
			`ce = xcalloc(1, size);`
			`memcpy(ce->name, path, namelen);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`ce->ce_namelen = namelen;`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`if (!intent_only)`
			`fill_stat_cache_info(ce, st);`
git add --intent-to-add: fix removal of cached emptiness This uses the extended index flag mechanism introduced earlier to mark the entries added to the index via "git add -N" with CE_INTENT_TO_ADD. The logic to detect an "intent to add" entry for the purpose of allowing "git rm --cached $path" is tightened to check not just for a staged empty blob, but with the CE_INTENT_TO_ADD bit. This protects an empty blob that was explicitly added and then modified in the work tree from being dropped with this sequence: $ >empty $ git add empty $ echo "non empty" >empty $ git rm --cached empty Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-11-29 04:55:25 +01:00			`else`
			`ce->ce_flags \|= CE_INTENT_TO_ADD;`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00
Add core.symlinks to mark filesystems that do not support symbolic links. Some file systems that can host git repositories and their working copies do not support symbolic links. But then if the repository contains a symbolic link, it is impossible to check out the working copy. This patch enables partial support of symbolic links so that it is possible to check out a working copy on such a file system. A new flag core.symlinks (which is true by default) can be set to false to indicate that the filesystem does not support symbolic links. In this case, symbolic links that exist in the trees are checked out as small plain files, and checking in modifications of these files preserve the symlink property in the database (as long as an entry exists in the index). Of course, this does not magically make symbolic links work on such defective file systems; hence, this solution does not help if the working copy relies on that an entry is a real symbolic link. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-02 22:11:30 +01:00			`if (trust_executable_bit && has_symlinks)`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`ce->ce_mode = create_ce_mode(st_mode);`
Do not take mode bits from index after type change. When we do not trust executable bit from lstat(2), we copied existing ce_mode bits without checking if the filesystem object is a regular file (which is the only thing we apply the "trust executable bit" business) nor if the blob in the index is a regular file (otherwise, we should do the same as registering a new regular file, which is to default non-executable). Noticed by Johannes Sixt. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-17 07:43:48 +01:00			`else {`
Add core.symlinks to mark filesystems that do not support symbolic links. Some file systems that can host git repositories and their working copies do not support symbolic links. But then if the repository contains a symbolic link, it is impossible to check out the working copy. This patch enables partial support of symbolic links so that it is possible to check out a working copy on such a file system. A new flag core.symlinks (which is true by default) can be set to false to indicate that the filesystem does not support symbolic links. In this case, symbolic links that exist in the trees are checked out as small plain files, and checking in modifications of these files preserve the symlink property in the database (as long as an entry exists in the index). Of course, this does not magically make symbolic links work on such defective file systems; hence, this solution does not help if the working copy relies on that an entry is a real symbolic link. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-02 22:11:30 +01:00			`/* If there is an existing entry, pick the mode bits and type`
			`* from it, otherwise assume unexecutable regular file.`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00			`*/`
Do not take mode bits from index after type change. When we do not trust executable bit from lstat(2), we copied existing ce_mode bits without checking if the filesystem object is a regular file (which is the only thing we apply the "trust executable bit" business) nor if the blob in the index is a regular file (otherwise, we should do the same as registering a new regular file, which is to default non-executable). Noticed by Johannes Sixt. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-17 07:43:48 +01:00			`struct cache_entry *ent;`
git add: respect core.filemode with unmerged entries When a merge left unmerged entries, git add failed to pick up the file mode from the index, when core.filemode == 0. If more than one unmerged entry is there, the order of stage preference is 2, 1, 3. Noticed by Johannes Sixt. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-29 19:32:46 +02:00			`int pos = index_name_pos_also_unmerged(istate, path, namelen);`
Do not take mode bits from index after type change. When we do not trust executable bit from lstat(2), we copied existing ce_mode bits without checking if the filesystem object is a regular file (which is the only thing we apply the "trust executable bit" business) nor if the blob in the index is a regular file (otherwise, we should do the same as registering a new regular file, which is to default non-executable). Noticed by Johannes Sixt. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-17 07:43:48 +01:00
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`ent = (0 <= pos) ? istate->cache[pos] : NULL;`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`ce->ce_mode = ce_mode_from_stat(ent, st_mode);`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00			`}`

Support case folding for git add when core.ignorecase=true When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/ is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although the earlier portions of this patch series account for those differences in case, this patch makes the pathing consistent by folding the case of newly added files against the first file added with that path. In read-cache.c's add_to_index(), the index_name_exists() support used for git status's case insensitive directory lookups is used to find the proper directory case according to what the user already checked in. That is, MyDir/ABC/'s case is used to alter the stored path for fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt). This is especially important when cloning a repository to a case sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same directory on a Windows machine, but on Linux, the files exist in two separate directories. The update to add_to_index(), in effect, treats a Windows file system as case sensitive by making path case consistent. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:45 +02:00			`/* When core.ignorecase=true, determine if a directory of the same name but differing`
			`* case already exists within the Git repository. If it does, ensure the directory`
			`* case of the file being added to the repository matches (is folded into) the existing`
			`* entry's directory case.`
			`*/`
			`if (ignore_case) {`
			`const char *startPtr = ce->name;`
			`const char *ptr = startPtr;`
			`while (*ptr) {`
			`while (ptr && ptr != '/')`
			`++ptr;`
			`if (*ptr == '/') {`
			`struct cache_entry *foundce;`
			`++ptr;`
name-hash: stop storing trailing '/' on paths in index_state.dir_hash When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:16 +02:00			`foundce = index_dir_exists(istate, ce->name, ptr - ce->name - 1);`
Support case folding for git add when core.ignorecase=true When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/ is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although the earlier portions of this patch series account for those differences in case, this patch makes the pathing consistent by folding the case of newly added files against the first file added with that path. In read-cache.c's add_to_index(), the index_name_exists() support used for git status's case insensitive directory lookups is used to find the proper directory case according to what the user already checked in. That is, MyDir/ABC/'s case is used to alter the stored path for fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt). This is especially important when cloning a repository to a case sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same directory on a Windows machine, but on Linux, the files exist in two separate directories. The update to add_to_index(), in effect, treats a Windows file system as case sensitive by making path case consistent. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:45 +02:00			`if (foundce) {`
			`memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr);`
			`startPtr = ptr;`
			`}`
			`}`
			`}`
			`}`

employ new explicit "exists in index?" API Each caller of index_name_exists() knows whether it is looking for a directory or a file, and can avoid the unnecessary indirection of index_name_exists() by instead calling index_dir_exists() or index_file_exists() directly. Invoking the appropriate search function explicitly will allow a subsequent patch to relieve callers of the artificial burden of having to add a trailing '/' to the pathname given to index_dir_exists(). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:15 +02:00			`alias = index_file_exists(istate, ce->name, ce_namelen(ce), ignore_case);`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {`
add_file_to_index: skip rehashing if the cached stat already matches An earlier commit 366bfcb6 broke git-add by moving read_cache() call down, because it wanted the directory walking code to grab paths that are already in the index. The change serves its purpose, but introduces a regression because the responsibility of avoiding unnecessary reindexing by matching the cached stat is shifted nowhere. This makes it the job of add_file_to_index() function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-07-31 02:12:58 +02:00			`/* Nothing changed, really */`
			`free(ce);`
Make ce_uptodate() trustworthy again The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-24 09:10:20 +01:00			`if (!S_ISGITLINK(alias->ce_mode))`
			`ce_mark_uptodate(alias);`
Make git-add behave more sensibly in a case-insensitive environment This expands on the previous patch, and allows "git add" to sanely handle a filename that has changed case, keeping the case in the index constant, and avoiding aliases. In particular, if you have an index entry called "File", but the checked-out tree is case-corrupted and has an entry called "file" instead, doing a git add . (or naming "file" explicitly) will automatically notice that we have an alias, and will replace the name "file" with the existing index capitalization (ie "File"). However, if we actually have both a file called "File" and one called "file", and they don't have the same lstat() information (ie we're on a case-sensitive filesystem but have the "core.ignorecase" flag set), we will error out if we try to add them both. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-22 22:22:44 +01:00			`alias->ce_flags \|= CE_ADDED;`
add_file_to_index: skip rehashing if the cached stat already matches An earlier commit 366bfcb6 broke git-add by moving read_cache() call down, because it wanted the directory walking code to grab paths that are already in the index. The change serves its purpose, but introduces a regression because the responsibility of avoiding unnecessary reindexing by matching the cached stat is shifted nowhere. This makes it the job of add_file_to_index() function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-07-31 02:12:58 +02:00			`return 0;`
			`}`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`if (!intent_only) {`
index_fd(): turn write_object and format_check arguments into one flag The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-08 10:47:33 +02:00			`if (index_path(ce->sha1, path, st, HASH_WRITE_OBJECT))`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`return error("unable to index file %s", path);`
			`} else`
			`record_intent_to_add(ce);`

Make git-add behave more sensibly in a case-insensitive environment This expands on the previous patch, and allows "git add" to sanely handle a filename that has changed case, keeping the case in the index constant, and avoiding aliases. In particular, if you have an index entry called "File", but the checked-out tree is case-corrupted and has an entry called "file" instead, doing a git add . (or naming "file" explicitly) will automatically notice that we have an alias, and will replace the name "file" with the existing index capitalization (ie "File"). However, if we actually have both a file called "File" and one called "file", and they don't have the same lstat() information (ie we're on a case-sensitive filesystem but have the "core.ignorecase" flag set), we will error out if we try to add them both. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-22 22:22:44 +01:00			`if (ignore_case && alias && different_name(ce, alias))`
			`ce = create_alias_ce(ce, alias);`
			`ce->ce_flags \|= CE_ADDED;`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00
read-cache.c: typofix Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-17 03:48:58 +02:00			`/* It was suspected to be racily clean, but it turns out to be Ok */`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`was_same = (alias &&`
			`!ce_stage(alias) &&`
			`!hashcmp(alias->sha1, ce->sha1) &&`
			`ce->ce_mode == alias->ce_mode);`

			`if (pretend)`
			`;`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`else if (add_index_entry(istate, ce, add_option))`
Make the exit code of add_file_to_index actually useful Update the programs which used the function (as add_file_to_cache). Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-12 19:57:45 +02:00			`return error("unable to add %s to index",path);`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`if (verbose && !was_same)`
Make git-mv a builtin This also moves add_file_to_index() to read-cache.c. Oh, and while touching builtin-add.c, it also removes a duplicate git_config() call. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 03:52:35 +02:00			`printf("add '%s'\n", path);`
			`return 0;`
			`}`

"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`int add_file_to_index(struct index_state istate, const char path, int flags)`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`{`
			`struct stat st;`
			`if (lstat(path, &st))`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`die_errno("unable to stat '%s'", path);`
"git-add -n -u" should not add but just report Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-21 21:04:34 +02:00			`return add_to_index(istate, path, &st, flags);`
Avoid some unnecessary lstat() calls The commit sequence used to do if (file_exists(p->path)) add_file_to_cache(p->path, 0); where both "file_exists()" and "add_file_to_cache()" needed to do a lstat() on the path to do their work. This cuts down 'lstat()' calls for the partial commit case by two for each path we know about (because we do this twice per path). Just move the lstat() to the caller instead (that's all that "file_exists()" really does), and pass the stat information down to the add_to_cache() function. This essentially makes 'add_to_index()' the core function that adds a path to the index, getting the index pointer, the pathname and the stat information as arguments. There are then shorthand helper functions that use this core function: - 'add_to_cache()' is just 'add_to_index()' with the default index - 'add_file_to_cache/index()' is the same, but does the lstat() call itself, so you can pass just the pathname if you don't already have the stat information available. So old users of the 'add_file_to_xyzzy()' are essentially left unchanged, and this just exposes the more generic helper function that can take existing stat information into account. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-09 18:11:43 +02:00			`}`

Move make_cache_entry() from merge-recursive.c into read-cache.c The function make_cache_entry() is too useful to be hidden away in merge-recursive. So move it to libgit.a (exposing it via cache.h). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-11 05:17:28 +02:00			`struct cache_entry *make_cache_entry(unsigned int mode,`
			`const unsigned char sha1, const char path, int stage,`
			`int refresh)`
			`{`
			`int size, len;`
			`struct cache_entry *ce;`

print an error message for invalid path If verification of path failed, it is always better to print an error message saying this than relying on the caller function to print a meaningful error message (especially when the callee already prints error message for another situation). Because the callers of add_index_entry_with_check() did not print any error message, it resulted that the user would not notice the problem when checkout of an invalid path failed. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-11 18:39:37 +02:00			`if (!verify_path(path)) {`
			`error("Invalid path '%s'", path);`
Move make_cache_entry() from merge-recursive.c into read-cache.c The function make_cache_entry() is too useful to be hidden away in merge-recursive. So move it to libgit.a (exposing it via cache.h). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-11 05:17:28 +02:00			`return NULL;`
print an error message for invalid path If verification of path failed, it is always better to print an error message saying this than relying on the caller function to print a meaningful error message (especially when the callee already prints error message for another situation). Because the callers of add_index_entry_with_check() did not print any error message, it resulted that the user would not notice the problem when checkout of an invalid path failed. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-11 18:39:37 +02:00			`}`
Move make_cache_entry() from merge-recursive.c into read-cache.c The function make_cache_entry() is too useful to be hidden away in merge-recursive. So move it to libgit.a (exposing it via cache.h). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-11 05:17:28 +02:00
			`len = strlen(path);`
			`size = cache_entry_size(len);`
			`ce = xcalloc(1, size);`

			`hashcpy(ce->sha1, sha1);`
			`memcpy(ce->name, path, len);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`ce->ce_flags = create_ce_flags(stage);`
			`ce->ce_namelen = len;`
Move make_cache_entry() from merge-recursive.c into read-cache.c The function make_cache_entry() is too useful to be hidden away in merge-recursive. So move it to libgit.a (exposing it via cache.h). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-11 05:17:28 +02:00			`ce->ce_mode = create_ce_mode(mode);`

			`if (refresh)`
			`return refresh_cache_entry(ce, 0);`

			`return ce;`
			`}`

Convert "struct cache_entry " to "const ..." wherever possible I attempted to make index_state->cache[] a "const struct cache_entry " to find out how existing entries in index are modified and where. The question I have is what do we do if we really need to keep track of on-disk changes in the index. The result is - diff-lib.c: setting CE_UPTODATE - name-hash.c: setting CE_HASHED - preload-index.c, read-cache.c, unpack-trees.c and builtin/update-index: obvious - entry.c: write_entry() may refresh the checked out entry via fill_stat_cache_info(). This causes "non-const struct cache_entry " in builtin/apply.c, builtin/checkout-index.c and builtin/checkout.c - builtin/ls-files.c: --with-tree changes stagemask and may set CE_UPDATE Of these, write_entry() and its call sites are probably most interesting because it modifies on-disk info. But this is stat info and can be retrieved via refresh, at least for porcelain commands. Other just uses ce_flags for local purposes. So, keeping track of "dirty" entries is just a matter of setting a flag in index modification functions exposed by read-cache.c. Except unpack-trees, the rest of the code base does not do anything funny behind read-cache's back. The actual patch is less valueable than the summary above. But if anyone wants to re-identify the above sites. Applying this patch, then this: diff --git a/cache.h b/cache.h index 430d021..1692891 100644 --- a/cache.h +++ b/cache.h @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode) #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1) struct index_state { - struct cache_entry cache; + const struct cache_entry cache; unsigned int version; unsigned int cache_nr, cache_alloc, cache_changed; struct string_list *resolve_undo; will help quickly identify them without bogus warnings. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-09 17:29:00 +02:00			`int ce_same_name(const struct cache_entry a, const struct cache_entry b)`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`{`
			`int len = ce_namelen(a);`
			`return ce_namelen(b) == len && !memcmp(a->name, b->name, len);`
			`}`

Convert ce_path_match() to use struct pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-17 13:43:07 +01:00			`int ce_path_match(const struct cache_entry ce, const struct pathspec pathspec)`
Make "ce_match_path()" a generic helper function ... and make git-diff-files use it too. This all _should_ make the diffcore-pathspec.c phase unnecessary, since the diff'ers now all do the path matching early interally. 2005-07-15 01:55:06 +02:00			`{`
Convert ce_path_match() to use match_pathspec_depth() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-15 16:02:50 +01:00			`return match_pathspec_depth(pathspec, ce->name, ce_namelen(ce), 0, NULL);`
Make "ce_match_path()" a generic helper function ... and make git-diff-files use it too. This all _should_ make the diffcore-pathspec.c phase unnecessary, since the diff'ers now all do the path matching early interally. 2005-07-15 01:55:06 +02:00			`}`

Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`/*`
			`* We fundamentally don't like some paths: we don't want`
			`* dot or dot-dot anywhere, and for obvious reasons don't`
			`* want to recurse into ".git" either.`
			`*`
			`* Also, we don't want double slashes or slashes at the`
			`* end that can make pathnames ambiguous.`
			`*/`
			`static int verify_dotfile(const char *rest)`
			`{`
			`/*`
			`* The first character was '.', but that`
			`* has already been discarded, we now test`
			`* the rest.`
			`*/`
verify_dotfile(): do not assume '/' is the path seperator verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <theo@taletn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-08 14:04:41 +02:00
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`/* "." is not allowed */`
verify_dotfile(): do not assume '/' is the path seperator verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <theo@taletn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-08 14:04:41 +02:00			`if (rest == '\0' \|\| is_dir_sep(rest))`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`return 0;`

verify_dotfile(): do not assume '/' is the path seperator verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <theo@taletn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-08 14:04:41 +02:00			`switch (*rest) {`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`/*`
			`* ".git" followed by NUL or slash is bad. This`
			`* shares the path end test with the ".." case.`
			`*/`
			`case 'g':`
			`if (rest[1] != 'i')`
			`break;`
			`if (rest[2] != 't')`
			`break;`
			`rest += 2;`
			`/* fallthrough */`
			`case '.':`
verify_dotfile(): do not assume '/' is the path seperator verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <theo@taletn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-08 14:04:41 +02:00			`if (rest[1] == '\0' \|\| is_dir_sep(rest[1]))`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`return 0;`
			`}`
			`return 1;`
			`}`

			`int verify_path(const char *path)`
			`{`
			`char c;`

verify_path: consider dos drive prefix If someone manage to create a repo with a 'C:' entry in the root-tree, files can be written outside of the working-dir. This opens up a can-of-worms of exploits. Fix it by explicitly checking for a dos drive prefix when verifying a paht. While we're at it, make sure that paths beginning with '\' is considered absolute as well. Noticed-by: Theo Niessink <theo@taletn.com> Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-27 18:00:40 +02:00			`if (has_dos_drive_prefix(path))`
			`return 0;`

Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`goto inside;`
			`for (;;) {`
			`if (!c)`
			`return 1;`
verify_path: consider dos drive prefix If someone manage to create a repo with a 'C:' entry in the root-tree, files can be written outside of the working-dir. This opens up a can-of-worms of exploits. Fix it by explicitly checking for a dos drive prefix when verifying a paht. While we're at it, make sure that paths beginning with '\' is considered absolute as well. Noticed-by: Theo Niessink <theo@taletn.com> Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-27 18:00:40 +02:00			`if (is_dir_sep(c)) {`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`inside:`
			`c = *path++;`
verify_path(): simplify check at the directory boundary We simply want to say "At a directory boundary, be careful with a name that begins with a dot, forbid a name that ends with the boundary character or has duplicated bounadry characters". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-07 05:49:06 +02:00			`if ((c == '.' && !verify_dotfile(path)) \|\|`
			`is_dir_sep(c) \|\| c == '\0')`
			`return 0;`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`}`
			`c = *path++;`
			`}`
			`}`

Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`/*`
			`* Do we have another file that has the beginning components being a`
			`* proper superset of the name we're trying to add?`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`static int has_file_name(struct index_state *istate,`
			`const struct cache_entry *ce, int pos, int ok_to_replace)`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`{`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`int retval = 0;`
			`int len = ce_namelen(ce);`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`int stage = ce_stage(ce);`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`const char *name = ce->name;`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`while (pos < istate->cache_nr) {`
			`struct cache_entry *p = istate->cache[pos++];`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`if (len >= ce_namelen(p))`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`break;`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`if (memcmp(name, p->name, len))`
			`break;`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`if (ce_stage(p) != stage)`
			`continue;`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`if (p->name[len] != '/')`
			`continue;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (p->ce_flags & CE_REMOVE)`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`continue;`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`retval = -1;`
			`if (!ok_to_replace)`
			`break;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`remove_index_entry_at(istate, --pos);`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`}`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`return retval;`
			`}`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`/*`
			`* Do we have another file with a pathname that is a proper`
			`* subset of the name we're trying to add?`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`static int has_dir_name(struct index_state *istate,`
			`const struct cache_entry *ce, int pos, int ok_to_replace)`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`{`
			`int retval = 0;`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`int stage = ce_stage(ce);`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`const char *name = ce->name;`
			`const char *slash = name + ce_namelen(ce);`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`for (;;) {`
			`int len;`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`for (;;) {`
			`if (*--slash == '/')`
			`break;`
			`if (slash <= ce->name)`
			`return retval;`
			`}`
			`len = slash - name;`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`pos = index_name_stage_pos(istate, name, len, stage);`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`if (pos >= 0) {`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`/*`
			`* Found one, but not so fast. This could`
			`* be a marker that says "I was here, but`
			`* I am being removed". Such an entry is`
			`* not a part of the resulting tree, and`
			`* it is Ok to have a directory at the same`
			`* path.`
			`*/`
read-cache.c: fix a couple more CE_REMOVE conversion It is a D/F conflict if you want to add "foo/bar" to the index when "foo" already exists. Also it is a conflict if you want to add a file "foo" when "foo/bar" exists. An exception is when the existing entry is there only to mark "I used to be here but I am being removed". This is needed for operations such as "git read-tree -m -u" that update the index and then reflect the result to the work tree --- we need to remember what to remove somewhere, and we use the index for that. In such a case, an existing file "foo" is being removed and we can create "foo/" directory and hang "bar" underneath it without any conflict. We used to use (ce->ce_mode == 0) to mark an entry that is being removed, but (CE_REMOVE & ce->ce_flags) is used for that purpose these days. An earlier commit forgot to convert the logic in the code that checks D/F conflict condition. The old code knew that "to be removed" entries cannot be at higher stage and actively checked that condition, but it was an unnecessary check. This patch removes the extra check as well. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 06:24:21 +01:00			`if (!(istate->cache[pos]->ce_flags & CE_REMOVE)) {`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`retval = -1;`
			`if (!ok_to_replace)`
			`break;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`remove_index_entry_at(istate, pos);`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`continue;`
			`}`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`}`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`else`
			`pos = -pos-1;`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00
			`/*`
			`* Trivial optimization: if we find an entry that`
			`* already matches the sub-directory, then we know`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`* we're ok, and we can exit.`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`while (pos < istate->cache_nr) {`
			`struct cache_entry *p = istate->cache[pos];`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`if ((ce_namelen(p) <= len) \|\|`
			`(p->name[len] != '/') \|\|`
			`memcmp(p->name, name, len))`
			`break; /* not our subdirectory */`
read-cache.c: fix a couple more CE_REMOVE conversion It is a D/F conflict if you want to add "foo/bar" to the index when "foo" already exists. Also it is a conflict if you want to add a file "foo" when "foo/bar" exists. An exception is when the existing entry is there only to mark "I used to be here but I am being removed". This is needed for operations such as "git read-tree -m -u" that update the index and then reflect the result to the work tree --- we need to remember what to remove somewhere, and we use the index for that. In such a case, an existing file "foo" is being removed and we can create "foo/" directory and hang "bar" underneath it without any conflict. We used to use (ce->ce_mode == 0) to mark an entry that is being removed, but (CE_REMOVE & ce->ce_flags) is used for that purpose these days. An earlier commit forgot to convert the logic in the code that checks D/F conflict condition. The old code knew that "to be removed" entries cannot be at higher stage and actively checked that condition, but it was an unnecessary check. This patch removes the extra check as well. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 06:24:21 +01:00			`if (ce_stage(p) == stage && !(p->ce_flags & CE_REMOVE))`
			`/*`
			`* p is at the same stage as our entry, and`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`* is a subdirectory of what we are looking`
			`* at, so we cannot have conflicts at our`
			`* level or anything shorter.`
			`*/`
			`return retval;`
			`pos++;`
Add git-update-cache --replace option. When "path" exists as a file or a symlink in the index, an attempt to add "path/file" is refused because it results in file vs directory conflict. Similarly when "path/file1", "path/file2", etc. exist, an attempt to add "path" as a file or a symlink is refused. With git-update-cache --replace, these existing entries that conflict with the entry being added are automatically removed from the cache, with warning messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:55:21 +02:00			`}`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`}`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`return retval;`
			`}`

			`/* We may be in a situation where we already have path/file and path`
			`* is being added, or we already have path and path/file is being`
			`* added. Either one would result in a nonsense tree that has path`
			`* twice when git-write-tree tries to write it out. Prevent it.`
War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-07 09:04:01 +02:00			`*`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`* If ok-to-replace is specified, we remove the conflicting entries`
			`* from the cache so the caller should recompute the insert position.`
			`* When this happens, we return non-zero.`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`static int check_file_directory_conflict(struct index_state *istate,`
			`const struct cache_entry *ce,`
			`int pos, int ok_to_replace)`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`{`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`int retval;`

			`/*`
			`* When ce is an "I am going away" entry, we allow it to be added`
			`*/`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (ce->ce_flags & CE_REMOVE)`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`return 0;`

Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`/*`
			`* We check if the path is a sub-path of a subsequent pathname`
			`* first, since removing those will not change the position`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00			`* in the array.`
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`retval = has_file_name(istate, ce, pos, ok_to_replace);`
add_cache_entry(): removal of file foo does not conflict with foo/bar Similarly, removal of file foo/bar does not conflict with a file foo. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-30 10:55:37 +02:00
Re-implement "check_file_directory_conflict()" This is (imho) more readable, and is also a lot faster. The expense of looking up sub-directory beginnings was killing us on things like "git-diff-cache", even though that one didn't even care at all about the file vs directory conflicts. We really only care when somebody tries to add a conflicting name to stage 0. We should go through the conflict rules more carefully some day. 2005-06-19 05:21:34 +02:00			`/*`
			`* Then check if the path might have a clashing sub-directory`
			`* before it.`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`return retval + has_dir_name(istate, ce, pos, ok_to_replace);`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00			`}`

Optimize "diff --cached" performance. The read_tree() function is called only from the call chain to run "git diff --cached" (this includes the internal call made by git-runstatus to run_diff_index()). The function vacates stage without any funky "merge" magic. The caller then goes and compares stage #1 entries from the tree with stage #0 entries from the original index. When adding the cache entries this way, it used the general purpose add_cache_entry(). This function looks for an existing entry to replace or if there is none to find where to insert the new entry, resolves D/F conflict and all the other things. For the purpose of reading entries into an empty stage, none of that processing is needed. We can instead append everything and then sort the result at the end. This commit changes read_tree() to first make sure that there is no existing cache entries at specified stage, and if that is the case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND flag (new), and then sort the resulting cache using qsort(). This new flag tells add_cache_entry() to omit all the checks such as "Does this path already exist? Does adding this path remove other existing entries because it turns a directory to a file?" and instead append the given cache entry straight at the end of the active cache. The caller of course is expected to sort the resulting cache at the end before using the result. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-09 22:42:50 +02:00			`static int add_index_entry_with_check(struct index_state istate, struct cache_entry ce, int option)`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`{`
			`int pos;`
Add git-update-cache --replace option. When "path" exists as a file or a symlink in the index, an attempt to add "path/file" is refused because it results in file vs directory conflict. Similarly when "path/file1", "path/file2", etc. exist, an attempt to add "path" as a file or a symlink is refused. With git-update-cache --replace, these existing entries that conflict with the entry being added are automatically removed from the cache, with warning messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:55:21 +02:00			`int ok_to_add = option & ADD_CACHE_OK_TO_ADD;`
			`int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;`
[PATCH] Fix oversimplified optimization for add_cache_entry(). An earlier change to optimize directory-file conflict check broke what "read-tree --emu23" expects. This is fixed by this commit. (1) Introduces an explicit flag to tell add_cache_entry() not to check for conflicts and use it when reading an existing tree into an empty stage --- by definition this case can never introduce such conflicts. (2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name() aware of the cache stages, and flag conflict only with paths in the same stage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-25 11:25:29 +02:00			`int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`int new_only = option & ADD_CACHE_NEW_ONLY;`
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-09 06:15:24 +01:00
Simplify cache API Earlier, add_file_to_index() invalidated the path in the cache-tree but remove_file_from_cache() did not, and the user of the latter needed to invalidate the entry himself. This led to a few bugs due to missed invalidate calls already. This patch makes the management of cache-tree less error prone by making more invalidate calls from lower level cache API functions. The rules are: - If you are going to write the index, you should either maintain cache_tree correctly. - If you cannot, alternatively you can remove the entire cache_tree by calling cache_tree_free() before you call write_cache(). - When you modify the index, cache_tree_invalidate_path() should be called with the path you are modifying, to discard the entry from the cache-tree structure. - The following cache API functions exported from read-cache.c (and the macro whose names have "cache" instead of "index") automatically call cache_tree_invalidate_path() for you: - remove_file_from_index(); - add_file_to_index(); - add_index_entry(); You can modify the index bypassing the above API functions (e.g. find an existing cache entry from the index and modify it in place). You need to call cache_tree_invalidate_path() yourself in such a case. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-14 05:33:11 +02:00			`cache_tree_invalidate_path(istate->cache_tree, ce->name);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00
Use core.filemode. With "[core] filemode = false", you can tell git to ignore differences in the working tree file only in executable bit. * "git-update-index --refresh" does not say "needs update" if index entry and working tree file differs only in executable bit. * "git-update-index" on an existing path takes executable bit from the existing index entry, if the path and index entry are both regular files. * "git-diff-files" and "git-diff-index" without --cached flag pretend the path on the filesystem has the same executable bit as the existing index entry, if the path and index entry are both regular files. If you are on a filesystem with unreliable mode bits, you may need to force the executable bit after registering the path in the index. * "git-update-index --chmod=+x foo" flips the executable bit of the index file entry for path "foo" on. Use "--chmod=-x" to flip it off. Note that --chmod only works in index file and does not look at nor update the working tree. So if you are on a filesystem and do not have working executable bit, you would do: 1. set the appropriate .git/config option; 2. "git-update-index --add new-file.c" 3. "git-ls-files --stage new-file.c" to see if it has the desired mode bits. If not, e.g. to drop executable bit picked up from the filesystem, say "git-update-index --chmod=-x new-file.c". Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 03:45:33 +02:00			`/* existing match? Just replace it. */`
Fix off-by-one error in removal of cache entry. Also make the return value of "cache_name_pos()" be sane: positive or zero if we found it (it's the index into the cache array), and "-pos-1" to indicate where it should go if we didn't. 2005-04-11 07:06:50 +02:00			`if (pos >= 0) {`
git-add --intent-to-add (-N) This adds "--intent-to-add" option to "git add". This is to let the system know that you will tell it the final contents to be staged later, iow, just be aware of the presense of the path with the type of the blob for now. It is implemented by staging an empty blob as the content. With this sequence: $ git reset --hard $ edit newfile $ git add -N newfile $ edit newfile oldfile $ git diff the diff will show all changes relative to the current commit. Then you can do: $ git commit -a ;# commit everything or $ git commit oldfile ;# only oldfile, newfile not yet added to pretend you are working with an index-free system like CVS. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-21 10:44:53 +02:00			`if (!new_only)`
			`replace_index_entry(istate, pos, ce);`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`return 0;`
			`}`
Fix off-by-one error in removal of cache entry. Also make the return value of "cache_name_pos()" be sane: positive or zero if we found it (it's the index into the cache array), and "-pos-1" to indicate where it should go if we didn't. 2005-04-11 07:06:50 +02:00			`pos = -pos-1;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`/*`
			`* Inserting a merged entry ("stage 0") into the index`
			`* will always replace all non-merged entries..`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`if (pos < istate->cache_nr && ce_stage(ce) == 0) {`
			`while (ce_same_name(istate->cache[pos], ce)) {`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`ok_to_add = 1;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`if (!remove_index_entry_at(istate, pos))`
When inserting a index entry of stage 0, remove all old unmerged entries. This allows you to actually tell git that you've resolved a conflict. 2005-04-16 21:05:45 +02:00			`break;`
			`}`
			`}`

Make "update-cache" a bit friendlier to use (and harder to mis-use). It now requires the "--add" flag before you add any new files, and a "--remove" file if you want to mark files for removal. And giving it the "--refresh" flag makes it just update all the files that it already knows about. 2005-04-10 20:32:54 +02:00			`if (!ok_to_add)`
			`return -1;`
Prevent bogus paths from being added to the index. With this one, it's now a fatal error to try to add a pathname that cannot be added with "git add", i.e. [torvalds@g5 git]$ git add .git/config fatal: unable to add .git/config to index and [torvalds@g5 git]$ git add foo/../bar fatal: unable to add foo/../bar to index instead of the old "Ignoring path xyz" warning that would end up silently succeeding on any other paths. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-18 21:07:31 +02:00			`if (!verify_path(ce->name))`
print an error message for invalid path If verification of path failed, it is always better to print an error message saying this than relying on the caller function to print a meaningful error message (especially when the callee already prints error message for another situation). Because the callers of add_index_entry_with_check() did not print any error message, it resulted that the user would not notice the problem when checkout of an invalid path failed. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-11 18:39:37 +02:00			`return error("Invalid path '%s'", ce->name);`
Make "update-cache" a bit friendlier to use (and harder to mis-use). It now requires the "--add" flag before you add any new files, and a "--remove" file if you want to mark files for removal. And giving it the "--refresh" flag makes it just update all the files that it already knows about. 2005-04-10 20:32:54 +02:00
Use core.filemode. With "[core] filemode = false", you can tell git to ignore differences in the working tree file only in executable bit. * "git-update-index --refresh" does not say "needs update" if index entry and working tree file differs only in executable bit. * "git-update-index" on an existing path takes executable bit from the existing index entry, if the path and index entry are both regular files. * "git-diff-files" and "git-diff-index" without --cached flag pretend the path on the filesystem has the same executable bit as the existing index entry, if the path and index entry are both regular files. If you are on a filesystem with unreliable mode bits, you may need to force the executable bit after registering the path in the index. * "git-update-index --chmod=+x foo" flips the executable bit of the index file entry for path "foo" on. Use "--chmod=-x" to flip it off. Note that --chmod only works in index file and does not look at nor update the working tree. So if you are on a filesystem and do not have working executable bit, you would do: 1. set the appropriate .git/config option; 2. "git-update-index --add new-file.c" 3. "git-ls-files --stage new-file.c" to see if it has the desired mode bits. If not, e.g. to drop executable bit picked up from the filesystem, say "git-update-index --chmod=-x new-file.c". Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-12 03:45:33 +02:00			`if (!skip_df_check &&`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`check_file_directory_conflict(istate, ce, pos, ok_to_replace)) {`
Add git-update-cache --replace option. When "path" exists as a file or a symlink in the index, an attempt to add "path/file" is refused because it results in file vs directory conflict. Similarly when "path/file1", "path/file2", etc. exist, an attempt to add "path" as a file or a symlink is refused. With git-update-cache --replace, these existing entries that conflict with the entry being added are automatically removed from the cache, with warning messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:55:21 +02:00			`if (!ok_to_replace)`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`return error("'%s' appears as both a file and as a directory",`
			`ce->name);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));`
Add git-update-cache --replace option. When "path" exists as a file or a symlink in the index, an attempt to add "path/file" is refused because it results in file vs directory conflict. Similarly when "path/file1", "path/file2", etc. exist, an attempt to add "path" as a file or a symlink is refused. With git-update-cache --replace, these existing entries that conflict with the entry being added are automatically removed from the cache, with warning messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:55:21 +02:00			`pos = -pos-1;`
			`}`
Optimize "diff --cached" performance. The read_tree() function is called only from the call chain to run "git diff --cached" (this includes the internal call made by git-runstatus to run_diff_index()). The function vacates stage without any funky "merge" magic. The caller then goes and compares stage #1 entries from the tree with stage #0 entries from the original index. When adding the cache entries this way, it used the general purpose add_cache_entry(). This function looks for an existing entry to replace or if there is none to find where to insert the new entry, resolves D/F conflict and all the other things. For the purpose of reading entries into an empty stage, none of that processing is needed. We can instead append everything and then sort the result at the end. This commit changes read_tree() to first make sure that there is no existing cache entries at specified stage, and if that is the case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND flag (new), and then sort the resulting cache using qsort(). This new flag tells add_cache_entry() to omit all the checks such as "Does this path already exist? Does adding this path remove other existing entries because it turns a directory to a file?" and instead append the given cache entry straight at the end of the active cache. The caller of course is expected to sort the resulting cache at the end before using the result. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-09 22:42:50 +02:00			`return pos + 1;`
			`}`

			`int add_index_entry(struct index_state istate, struct cache_entry ce, int option)`
			`{`
			`int pos;`

			`if (option & ADD_CACHE_JUST_APPEND)`
			`pos = istate->cache_nr;`
			`else {`
			`int ret;`
			`ret = add_index_entry_with_check(istate, ce, option);`
			`if (ret <= 0)`
			`return ret;`
			`pos = ret - 1;`
			`}`
git-update-cache refuses to add a file where a directory is registed. And vice versa. The next commit will introduce an option --replace to allow replacing existing entries. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-05-08 06:48:12 +02:00
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`/* Make sure the array is big enough .. */`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`if (istate->cache_nr == istate->cache_alloc) {`
			`istate->cache_alloc = alloc_nr(istate->cache_alloc);`
			`istate->cache = xrealloc(istate->cache,`
read-cache: trivial style cleanups Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-30 15:56:19 +02:00			`istate->cache_alloc * sizeof(*istate->cache));`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`}`

			`/* Add it in.. */`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_nr++;`
Optimize "diff --cached" performance. The read_tree() function is called only from the call chain to run "git diff --cached" (this includes the internal call made by git-runstatus to run_diff_index()). The function vacates stage without any funky "merge" magic. The caller then goes and compares stage #1 entries from the tree with stage #0 entries from the original index. When adding the cache entries this way, it used the general purpose add_cache_entry(). This function looks for an existing entry to replace or if there is none to find where to insert the new entry, resolves D/F conflict and all the other things. For the purpose of reading entries into an empty stage, none of that processing is needed. We can instead append everything and then sort the result at the end. This commit changes read_tree() to first make sure that there is no existing cache entries at specified stage, and if that is the case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND flag (new), and then sort the resulting cache using qsort(). This new flag tells add_cache_entry() to omit all the checks such as "Does this path already exist? Does adding this path remove other existing entries because it turns a directory to a file?" and instead append the given cache entry straight at the end of the active cache. The caller of course is expected to sort the resulting cache at the end before using the result. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-09 22:42:50 +02:00			`if (istate->cache_nr > pos + 1)`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`memmove(istate->cache + pos + 1,`
			`istate->cache + pos,`
			`(istate->cache_nr - pos - 1) * sizeof(ce));`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`set_index_entry(istate, pos, ce);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_changed = 1;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`return 0;`
			`}`

Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`/*`
			`* "refresh" does not calculate a new sha1 file or bring the`
			`* cache up-to-date for mode/content changes. But what it`
			`* _does_ do is to "re-match" the stat information of a file`
			`* with the cache, so that you can refresh the cache for a`
			`* file that hasn't been changed but where the stat entry is`
			`* out of date.`
			`*`
			`* For example, you'd want to do this after doing a "git-read-tree",`
			`* to link up the stat cache details with the proper files.`
			`*/`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`static struct cache_entry refresh_cache_ent(struct index_state istate,`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`struct cache_entry *ce,`
read-cache: let refresh_cache_ent pass up changed flags This will enable refresh_cache to differentiate more cases of modification (such as typechange) when telling the user what isn't fresh. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:08 +01:00			`unsigned int options, int *err,`
			`int *changed_ret)`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`{`
			`struct stat st;`
			`struct cache_entry *updated;`
			`int changed, size;`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`int ignore_valid = options & CE_MATCH_IGNORE_VALID;`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`int ignore_skip_worktree = options & CE_MATCH_IGNORE_SKIP_WORKTREE;`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00
Avoid running lstat(2) on the same cache entry. Aside from the lstat(2) done for work tree files, there are quite many lstat(2) calls in refname dwimming codepath. This patch is not about reducing them. * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the cache entries that record a regular file blob that is up to date in the work tree. If somebody later walks the index and wants to see if the work tree has changes, they do not have to be checked with lstat(2) again. * fill_stat_cache_info() marks the cache entry it just added with CE_UPTODATE. This has the effect of marking the paths we write out of the index and lstat(2) immediately as "no need to lstat -- we know it is up-to-date", from quite a lot fo callers: - git-apply --index - git-update-index - git-checkout-index - git-add (uses add_file_to_index()) - git-commit (ditto) - git-mv (ditto) * refresh_cache_ent() also marks the cache entry that are clean with CE_UPTODATE. * write_index is changed not to write CE_UPTODATE out to the index file, because CE_UPTODATE is meant to be transient only in core. For the same reason, CE_UPDATE is not written to prevent an accident from happening. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:45:24 +01:00			`if (ce_uptodate(ce))`
			`return ce;`

Add shortcut in refresh_cache_ent() for marked entries. When a cache entry has been marked as CE_VALID, the user has promised us that any change in the work tree does not matter. Just mark the entry as up-to-date, and continue. Signed-off-by: Marius Storm-Olsen <marius@trolltech.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 14:38:35 +02:00			`/*`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`* CE_VALID or CE_SKIP_WORKTREE means the user promised us`
			`* that the change to the work tree does not matter and told`
			`* us not to worry.`
Add shortcut in refresh_cache_ent() for marked entries. When a cache entry has been marked as CE_VALID, the user has promised us that any change in the work tree does not matter. Just mark the entry as up-to-date, and continue. Signed-off-by: Marius Storm-Olsen <marius@trolltech.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 14:38:35 +02:00			`*/`
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID Previously CE_MATCH_IGNORE_VALID flag is used by both valid and skip-worktree bits. While the two bits have similar behaviour, sharing this flag means "git update-index --really-refresh" will ignore skip-worktree while it should not. Instead another flag is introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only applies to valid bit. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-14 12:43:58 +01:00			`if (!ignore_skip_worktree && ce_skip_worktree(ce)) {`
			`ce_mark_uptodate(ce);`
			`return ce;`
			`}`
Add shortcut in refresh_cache_ent() for marked entries. When a cache entry has been marked as CE_VALID, the user has promised us that any change in the work tree does not matter. Just mark the entry as up-to-date, and continue. Signed-off-by: Marius Storm-Olsen <marius@trolltech.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 14:38:35 +02:00			`if (!ignore_valid && (ce->ce_flags & CE_VALID)) {`
			`ce_mark_uptodate(ce);`
			`return ce;`
			`}`

Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`if (lstat(ce->name, &st) < 0) {`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00			`if (err)`
			`*err = errno;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`return NULL;`
			`}`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`changed = ie_match_stat(istate, ce, &st, options);`
read-cache: let refresh_cache_ent pass up changed flags This will enable refresh_cache to differentiate more cases of modification (such as typechange) when telling the user what isn't fresh. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:08 +01:00			`if (changed_ret)`
			`*changed_ret = changed;`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`if (!changed) {`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`/*`
			`* The path is unchanged. If we were told to ignore`
			`* valid bit, then we did the actual stat check and`
			`* found that the entry is unmodified. If the entry`
			`* is not marked VALID, this is the place to mark it`
			`* valid again, under "assume unchanged" mode.`
			`*/`
			`if (ignore_valid && assume_unchanged &&`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`!(ce->ce_flags & CE_VALID))`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`; /* mark this one VALID again */`
Avoid running lstat(2) on the same cache entry. Aside from the lstat(2) done for work tree files, there are quite many lstat(2) calls in refname dwimming codepath. This patch is not about reducing them. * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the cache entries that record a regular file blob that is up to date in the work tree. If somebody later walks the index and wants to see if the work tree has changes, they do not have to be checked with lstat(2) again. * fill_stat_cache_info() marks the cache entry it just added with CE_UPTODATE. This has the effect of marking the paths we write out of the index and lstat(2) immediately as "no need to lstat -- we know it is up-to-date", from quite a lot fo callers: - git-apply --index - git-update-index - git-checkout-index - git-add (uses add_file_to_index()) - git-commit (ditto) - git-mv (ditto) * refresh_cache_ent() also marks the cache entry that are clean with CE_UPTODATE. * write_index is changed not to write CE_UPTODATE out to the index file, because CE_UPTODATE is meant to be transient only in core. For the same reason, CE_UPDATE is not written to prevent an accident from happening. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:45:24 +01:00			`else {`
			`/*`
			`* We do not mark the index itself "modified"`
			`* because CE_UPTODATE flag is in-core only;`
			`* we are not going to write this change out.`
			`*/`
Make ce_uptodate() trustworthy again The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-24 09:10:20 +01:00			`if (!S_ISGITLINK(ce->ce_mode))`
			`ce_mark_uptodate(ce);`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`return ce;`
Avoid running lstat(2) on the same cache entry. Aside from the lstat(2) done for work tree files, there are quite many lstat(2) calls in refname dwimming codepath. This patch is not about reducing them. * It adds a new ce_flag, CE_UPTODATE, that is meant to mark the cache entries that record a regular file blob that is up to date in the work tree. If somebody later walks the index and wants to see if the work tree has changes, they do not have to be checked with lstat(2) again. * fill_stat_cache_info() marks the cache entry it just added with CE_UPTODATE. This has the effect of marking the paths we write out of the index and lstat(2) immediately as "no need to lstat -- we know it is up-to-date", from quite a lot fo callers: - git-apply --index - git-update-index - git-checkout-index - git-add (uses add_file_to_index()) - git-commit (ditto) - git-mv (ditto) * refresh_cache_ent() also marks the cache entry that are clean with CE_UPTODATE. * write_index is changed not to write CE_UPTODATE out to the index file, because CE_UPTODATE is meant to be transient only in core. For the same reason, CE_UPDATE is not written to prevent an accident from happening. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:45:24 +01:00			`}`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`}`

ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`if (ie_modified(istate, ce, &st, options)) {`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00			`if (err)`
			`*err = EINVAL;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`return NULL;`
			`}`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00
			`size = ce_size(ce);`
			`updated = xmalloc(size);`
			`memcpy(updated, ce, size);`
			`fill_stat_cache_info(updated, &st);`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`/*`
			`* If ignore_valid is not set, we should leave CE_VALID bit`
			`* alone. Otherwise, paths marked with --no-assume-unchanged`
			`* (i.e. things to be edited) will reacquire CE_VALID bit`
			`* automatically, which is not really what we want.`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`*/`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`if (!ignore_valid && assume_unchanged &&`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`!(ce->ce_flags & CE_VALID))`
			`updated->ce_flags &= ~CE_VALID;`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00
			`return updated;`
			`}`

reset: make the reminder output consistent with "checkout" git reset without argument displays a summary of the local modification, like this: $ git reset Makefile: locally modified Some people have problems with this; they look like an error message. This patch makes its output mimic how "git checkout $another_branch" reports the paths with local modifications. "git add --refresh --verbose" is changed in the same way. It also adds a header to make it clear that the output is informative, and not an error. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> 2009-08-21 10:57:59 +02:00			`static void show_file(const char * fmt, const char * name, int in_porcelain,`
update-index --refresh --porcelain: add missing const Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-22 23:43:23 +01:00			`int * first, const char *header_msg)`
reset: make the reminder output consistent with "checkout" git reset without argument displays a summary of the local modification, like this: $ git reset Makefile: locally modified Some people have problems with this; they look like an error message. This patch makes its output mimic how "git checkout $another_branch" reports the paths with local modifications. "git add --refresh --verbose" is changed in the same way. It also adds a header to make it clear that the output is informative, and not an error. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> 2009-08-21 10:57:59 +02:00			`{`
			`if (in_porcelain && *first && header_msg) {`
			`printf("%s\n", header_msg);`
whitespace: have SP on both sides of an assignment "=" I've deliberately excluded the borrowed code in compat/nedmalloc directory. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-25 23:46:52 +02:00			`*first = 0;`
reset: make the reminder output consistent with "checkout" git reset without argument displays a summary of the local modification, like this: $ git reset Makefile: locally modified Some people have problems with this; they look like an error message. This patch makes its output mimic how "git checkout $another_branch" reports the paths with local modifications. "git add --refresh --verbose" is changed in the same way. It also adds a header to make it clear that the output is informative, and not an error. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> 2009-08-21 10:57:59 +02:00			`}`
			`printf(fmt, name);`
			`}`

convert refresh_index to take struct pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:54 +02:00			`int refresh_index(struct index_state *istate, unsigned int flags,`
			`const struct pathspec *pathspec,`
update-index --refresh --porcelain: add missing const Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-22 23:43:23 +01:00			`char seen, const char header_msg)`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`{`
			`int i;`
			`int has_errors = 0;`
			`int really = (flags & REFRESH_REALLY) != 0;`
			`int allow_unmerged = (flags & REFRESH_UNMERGED) != 0;`
			`int quiet = (flags & REFRESH_QUIET) != 0;`
			`int not_new = (flags & REFRESH_IGNORE_MISSING) != 0;`
Teach update-index about --ignore-submodules Like with the diff machinery, update-index should sometimes just ignore submodules (e.g. to determine a clean state before a rebase). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:03:45 +02:00			`int ignore_submodules = (flags & REFRESH_IGNORE_SUBMODULES) != 0;`
reset: make the reminder output consistent with "checkout" git reset without argument displays a summary of the local modification, like this: $ git reset Makefile: locally modified Some people have problems with this; they look like an error message. This patch makes its output mimic how "git checkout $another_branch" reports the paths with local modifications. "git add --refresh --verbose" is changed in the same way. It also adds a header to make it clear that the output is informative, and not an error. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> 2009-08-21 10:57:59 +02:00			`int first = 1;`
			`int in_porcelain = (flags & REFRESH_IN_PORCELAIN);`
ce_match_stat, run_diff_files: use symbolic constants for readability ce_match_stat() can be told: (1) to ignore CE_VALID bit (used under "assume unchanged" mode) and perform the stat comparison anyway; (2) not to perform the contents comparison for racily clean entries and report mismatch of cached stat information; using its "option" parameter. Give them symbolic constants. Similarly, run_diff_files() can be told not to report anything on removed paths. Also give it a symbolic constant for that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 09:15:03 +01:00			`unsigned int options = really ? CE_MATCH_IGNORE_VALID : 0;`
refresh_index: rename format variables When refreshing the index, for modified (or unmerged) files we will print "needs update" (or "needs merge") for plumbing, or line similar to the output from "diff --name-status" for porcelain. The variables holding which type of message to show are named after the plumbing messages. However, as we begin to differentiate more cases at the porcelain level (with the plumbing message staying the same), that naming scheme will become awkward. Instead, name the variables after which case we found (modified or unmerged), not what we will output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:28 +01:00			`const char *modified_fmt;`
refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00			`const char *deleted_fmt;`
			`const char *typechange_fmt;`
			`const char *added_fmt;`
refresh_index: rename format variables When refreshing the index, for modified (or unmerged) files we will print "needs update" (or "needs merge") for plumbing, or line similar to the output from "diff --name-status" for porcelain. The variables holding which type of message to show are named after the plumbing messages. However, as we begin to differentiate more cases at the porcelain level (with the plumbing message staying the same), that naming scheme will become awkward. Instead, name the variables after which case we found (modified or unmerged), not what we will output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:28 +01:00			`const char *unmerged_fmt;`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00
refresh_index: rename format variables When refreshing the index, for modified (or unmerged) files we will print "needs update" (or "needs merge") for plumbing, or line similar to the output from "diff --name-status" for porcelain. The variables holding which type of message to show are named after the plumbing messages. However, as we begin to differentiate more cases at the porcelain level (with the plumbing message staying the same), that naming scheme will become awkward. Instead, name the variables after which case we found (modified or unmerged), not what we will output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:28 +01:00			`modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");`
refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00			`deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");`
			`typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");`
			`added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n");`
refresh_index: rename format variables When refreshing the index, for modified (or unmerged) files we will print "needs update" (or "needs merge") for plumbing, or line similar to the output from "diff --name-status" for porcelain. The variables holding which type of message to show are named after the plumbing messages. However, as we begin to differentiate more cases at the porcelain level (with the plumbing message staying the same), that naming scheme will become awkward. Instead, name the variables after which case we found (modified or unmerged), not what we will output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:28 +01:00			`unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n");`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`for (i = 0; i < istate->cache_nr; i++) {`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`struct cache_entry ce, new;`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00			`int cache_errno = 0;`
refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00			`int changed = 0;`
refresh_index: do not show unmerged path that is outside pathspec When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 19:11:05 +01:00			`int filtered = 0;`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`ce = istate->cache[i];`
Teach update-index about --ignore-submodules Like with the diff machinery, update-index should sometimes just ignore submodules (e.g. to determine a clean state before a rebase). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:03:45 +02:00			`if (ignore_submodules && S_ISGITLINK(ce->ce_mode))`
			`continue;`

refresh_index: do not show unmerged path that is outside pathspec When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 19:11:05 +01:00			`if (pathspec &&`
convert refresh_index to take struct pathspec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-14 10:35:54 +02:00			`!match_pathspec_depth(pathspec, ce->name, ce_namelen(ce), 0, seen))`
refresh_index: do not show unmerged path that is outside pathspec When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 19:11:05 +01:00			`filtered = 1;`

Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`if (ce_stage(ce)) {`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`while ((i < istate->cache_nr) &&`
			`! strcmp(istate->cache[i]->name, ce->name))`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`i++;`
			`i--;`
			`if (allow_unmerged)`
			`continue;`
refresh_index: do not show unmerged path that is outside pathspec When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 19:11:05 +01:00			`if (!filtered)`
			`show_file(unmerged_fmt, ce->name, in_porcelain,`
			`&first, header_msg);`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`has_errors = 1;`
			`continue;`
			`}`

refresh_index: do not show unmerged path that is outside pathspec When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 19:11:05 +01:00			`if (filtered)`
git-add: Add support for --refresh option. This allows to refresh only a subset of the project files, based on the specified pathspecs. Signed-off-by: Alexandre Julliard <julliard@winehq.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-11 23:59:01 +02:00			`continue;`

refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00			`new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`if (new == ce)`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`continue;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`if (!new) {`
refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00			`const char *fmt;`

Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`if (not_new && cache_errno == ENOENT)`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`continue;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`if (really && cache_errno == EINVAL) {`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`/* If we are doing --really-refresh that`
			`* means the index is not valid anymore.`
			`*/`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`ce->ce_flags &= ~CE_VALID;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_changed = 1;`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`}`
			`if (quiet)`
			`continue;`
refresh_index: make porcelain output more specific If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:13:08 +01:00
			`if (cache_errno == ENOENT)`
			`fmt = deleted_fmt;`
			`else if (ce->ce_flags & CE_INTENT_TO_ADD)`
			`fmt = added_fmt; /* must be before other checks */`
			`else if (changed & TYPE_CHANGED)`
			`fmt = typechange_fmt;`
			`else`
			`fmt = modified_fmt;`
			`show_file(fmt,`
			`ce->name, in_porcelain, &first, header_msg);`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`has_errors = 1;`
			`continue;`
			`}`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00
			`replace_index_entry(istate, i, new);`
Libify the index refresh logic This cleans up and libifies the "git update-index --[really-]refresh" functionality. This will be eventually required for eventually doing the "commit" and "status" commands as built-ins. It really just moves "refresh_index()" from update-index.c to read-cache.c, but it also has to change the calling convention so that the function uses a "unsigned int flags" argument instead of various static flags variables for passing down the information about whether to be quiet or not, and allow unmerged entries etc. That actually cleans up update-index.c too, since it turns out that all those flags were really specific to that one function of the index update, so they shouldn't have had file-scope visibility even before. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-19 18:56:35 +02:00			`}`
			`return has_errors;`
			`}`

read-cache.c: mark file-local functions static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-12 07:29:35 +01:00			`static struct cache_entry refresh_cache_entry(struct cache_entry ce, int really)`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00			`{`
read-cache: let refresh_cache_ent pass up changed flags This will enable refresh_cache to differentiate more cases of modification (such as typechange) when telling the user what isn't fresh. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 12:11:08 +01:00			`return refresh_cache_ent(&the_index, ce, really, NULL, NULL);`
Propagate cache error internal to refresh_cache() via parameter. The function refresh_cache() is the only user of cache_errno that switches its behaviour based on what internal function refresh_cache_entry() finds; pass the error status back in a parameter passed down to it, to get rid of the global variable cache_errno. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 06:34:34 +02:00			`}`

cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00
			`/*****************************************************************`
			`* Index File I/O`
			`*****************************************************************/`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`#define INDEX_FORMAT_DEFAULT 3`

cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`/*`
			`* dev/ino/uid/gid/size are also just tracked to the low 32 bits`
			`* Again - this is just a (very strong in practice) heuristic that`
			`* the inode hasn't changed.`
			`*`
			`* We save the fields in big-endian order to allow using the`
			`* index file over NFS transparently.`
			`*/`
			`struct ondisk_cache_entry {`
			`struct cache_time ctime;`
			`struct cache_time mtime;`
read-cache: use fixed width integer types Use the fixed width integer types uint16_t and uint32_t for on-disk structures; unsigned short and unsigned int do not have a guaranteed size. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-18 21:41:51 +02:00			`uint32_t dev;`
			`uint32_t ino;`
			`uint32_t mode;`
			`uint32_t uid;`
			`uint32_t gid;`
			`uint32_t size;`
cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`unsigned char sha1[20];`
read-cache: use fixed width integer types Use the fixed width integer types uint16_t and uint32_t for on-disk structures; unsigned short and unsigned int do not have a guaranteed size. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-18 21:41:51 +02:00			`uint16_t flags;`
cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`char name[FLEX_ARRAY]; /* more */`
			`};`

			`/*`
			`* This struct is used when CE_EXTENDED bit is 1`
			`* The struct must match ondisk_cache_entry exactly from`
			`* ctime till flags`
			`*/`
			`struct ondisk_cache_entry_extended {`
			`struct cache_time ctime;`
			`struct cache_time mtime;`
read-cache: use fixed width integer types Use the fixed width integer types uint16_t and uint32_t for on-disk structures; unsigned short and unsigned int do not have a guaranteed size. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-18 21:41:51 +02:00			`uint32_t dev;`
			`uint32_t ino;`
			`uint32_t mode;`
			`uint32_t uid;`
			`uint32_t gid;`
			`uint32_t size;`
cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`unsigned char sha1[20];`
read-cache: use fixed width integer types Use the fixed width integer types uint16_t and uint32_t for on-disk structures; unsigned short and unsigned int do not have a guaranteed size. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-18 21:41:51 +02:00			`uint16_t flags;`
			`uint16_t flags2;`
cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`char name[FLEX_ARRAY]; /* more */`
			`};`

read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`/* These are only used for v3 or lower */`
cache.h: hide on-disk index details The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:09 +02:00			`#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7)`
			`#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)`
			`#define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len)`
			`#define ondisk_ce_size(ce) (((ce)->ce_flags & CE_EXTENDED) ? \`
			`ondisk_cache_entry_extended_size(ce_namelen(ce)) : \`
			`ondisk_cache_entry_size(ce_namelen(ce)))`

index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`static int verify_hdr(struct cache_header *hdr, unsigned long size)`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`{`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA_CTX c;`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`unsigned char sha1[20];`
read-cache.c: report the header version we do not understand Instead of just saying "bad index version", report the value we read from the disk. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:12 +02:00			`int hdr_version;`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
Convert the index file reading/writing to use network byte order. This allows using a git tree over NFS with different byte order, and makes it possible to just copy a fully populated repository and have the end result immediately usable (needing just a refresh to update the stat information). 2005-04-15 19:44:27 +02:00			`if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`return error("bad signature");`
read-cache.c: report the header version we do not understand Instead of just saying "bad index version", report the value we read from the disk. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:12 +02:00			`hdr_version = ntohl(hdr->hdr_version);`
read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr() 9d22778 (read-cache.c: write prefix-compressed names in the index - 2012-04-04) defined these. Interestingly, they were not used by read-cache.c, or anywhere in that patch. They were used in builtin/update-index.c later for checking supported index versions. Use them here too. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-22 13:09:24 +01:00			`if (hdr_version < INDEX_FORMAT_LB \|\| INDEX_FORMAT_UB < hdr_version)`
read-cache.c: report the header version we do not understand Instead of just saying "bad index version", report the value we read from the disk. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:12 +02:00			`return error("bad index version %d", hdr_version);`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Init(&c);`
			`git_SHA1_Update(&c, hdr, size - 20);`
			`git_SHA1_Final(sha1, &c);`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`if (hashcmp(sha1, (unsigned char *)hdr + size - 20))`
Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00			`return error("bad index file sha1 signature");`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`return 0;`
			`}`

Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`static int read_index_extension(struct index_state *istate,`
			`const char ext, void data, unsigned long sz)`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`{`
			`switch (CACHE_EXT(ext)) {`
			`case CACHE_EXT_TREE:`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_tree = cache_tree_read(data, sz);`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`break;`
resolve-undo: record resolved conflicts in a new index extension section When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-25 09:30:51 +01:00			`case CACHE_EXT_RESOLVE_UNDO:`
			`istate->resolve_undo = resolve_undo_read(data, sz);`
			`break;`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`default:`
			`if (ext < 'A' \|\| 'Z' < ext)`
			`return error("index uses %.4s extension, which we do not understand",`
			`ext);`
			`fprintf(stderr, "ignoring %.4s extension\n", ext);`
			`break;`
			`}`
			`return 0;`
			`}`

Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int read_index(struct index_state *istate)`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`{`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`return read_index_from(istate, get_index_file());`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`}`

read-cache.c: allow unaligned mapping of the index file Both the on-disk format v2 and v3 pads the "name" field to the multiple of eight to make sure that various quantities in network long/short type can be accessed with ntohl/ntohs without having to worry about alignment, but this forces us to waste disk I/O bandwidth. Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were the regular ntohs()/ntohl() on a field that may not be aligned correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:10 +02:00			`#ifndef NEEDS_ALIGNED_ACCESS`
			`#define ntoh_s(var) ntohs(var)`
			`#define ntoh_l(var) ntohl(var)`
			`#else`
			`static inline uint16_t ntoh_s_force_align(void *p)`
			`{`
			`uint16_t x;`
			`memcpy(&x, p, sizeof(x));`
			`return ntohs(x);`
			`}`
			`static inline uint32_t ntoh_l_force_align(void *p)`
			`{`
			`uint32_t x;`
			`memcpy(&x, p, sizeof(x));`
			`return ntohl(x);`
			`}`
			`#define ntoh_s(var) ntoh_s_force_align(&(var))`
			`#define ntoh_l(var) ntoh_l_force_align(&(var))`
			`#endif`

read-cache.c: move code to copy ondisk to incore cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:13 +02:00			`static struct cache_entry cache_entry_from_ondisk(struct ondisk_cache_entry ondisk,`
			`unsigned int flags,`
			`const char *name,`
			`size_t len)`
			`{`
			`struct cache_entry *ce = xmalloc(cache_entry_size(len));`

Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);`
			`ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);`
			`ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);`
			`ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);`
			`ce->ce_stat_data.sd_dev = ntoh_l(ondisk->dev);`
			`ce->ce_stat_data.sd_ino = ntoh_l(ondisk->ino);`
read-cache.c: move code to copy ondisk to incore cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:13 +02:00			`ce->ce_mode = ntoh_l(ondisk->mode);`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`ce->ce_stat_data.sd_uid = ntoh_l(ondisk->uid);`
			`ce->ce_stat_data.sd_gid = ntoh_l(ondisk->gid);`
			`ce->ce_stat_data.sd_size = ntoh_l(ondisk->size);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`ce->ce_flags = flags & ~CE_NAMEMASK;`
			`ce->ce_namelen = len;`
read-cache.c: move code to copy ondisk to incore cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:13 +02:00			`hashcpy(ce->sha1, ondisk->sha1);`
			`memcpy(ce->name, name, len);`
			`ce->name[len] = '\0';`
			`return ce;`
			`}`

read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`/*`
			`* Adjacent cache entries tend to share the leading paths, so it makes`
			`* sense to only store the differences in later entries. In the v4`
			`* on-disk format of the index, each on-disk cache entry stores the`
			`* number of bytes to be stripped from the end of the previous name,`
			`* and the bytes to append to the result, to come up with its name.`
			`*/`
			`static unsigned long expand_name_field(struct strbuf name, const char cp_)`
			`{`
			`const unsigned char ep, cp = (const unsigned char *)cp_;`
			`size_t len = decode_varint(&cp);`

			`if (name->len < len)`
			`die("malformed name field in the index");`
			`strbuf_remove(name, name->len - len, len);`
			`for (ep = cp; *ep; ep++)`
			`; /* find the end */`
			`strbuf_add(name, cp, ep - cp);`
			`return (const char *)ep + 1 - cp_;`
			`}`

read-cache.c: make create_from_disk() report number of bytes it consumed The function is the one that is reading from the data stream. It only is natural to make it responsible for reporting this number, not the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:11 +02:00			`static struct cache_entry create_from_disk(struct ondisk_cache_entry ondisk,`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`unsigned long *ent_size,`
			`struct strbuf *previous_name)`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`{`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`struct cache_entry *ce;`
index: be careful when handling long names We currently use lower 12-bit (masked with CE_NAMEMASK) in the ce_flags field to store the length of the name in cache_entry, without checking the length parameter given to create_ce_flags(). This can make us store incorrect length. Currently we are mostly protected by the fact that many codepaths first copy the path in a variable of size PATH_MAX, which typically is 4096 that happens to match the limit, but that feels like a bug waiting to happen. Besides, that would not allow us to shorten the width of CE_NAMEMASK to use the bits for new flags. This redefines the meaning of the name length stored in the cache_entry. A name that does not fit is represented by storing CE_NAMEMASK in the field, and the actual length needs to be computed by actually counting the bytes in the name[] field. This way, only the unusually long paths need to suffer. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:42:00 +01:00			`size_t len;`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`const char *name;`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`unsigned int flags;`
index: be careful when handling long names We currently use lower 12-bit (masked with CE_NAMEMASK) in the ce_flags field to store the length of the name in cache_entry, without checking the length parameter given to create_ce_flags(). This can make us store incorrect length. Currently we are mostly protected by the fact that many codepaths first copy the path in a variable of size PATH_MAX, which typically is 4096 that happens to match the limit, but that feels like a bug waiting to happen. Besides, that would not allow us to shorten the width of CE_NAMEMASK to use the bits for new flags. This redefines the meaning of the name length stored in the cache_entry. A name that does not fit is represented by storing CE_NAMEMASK in the field, and the actual length needs to be computed by actually counting the bytes in the name[] field. This way, only the unusually long paths need to suffer. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:42:00 +01:00
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`/* On-disk flags are just 16 bits */`
read-cache.c: allow unaligned mapping of the index file Both the on-disk format v2 and v3 pads the "name" field to the multiple of eight to make sure that various quantities in network long/short type can be accessed with ntohl/ntohs without having to worry about alignment, but this forces us to waste disk I/O bandwidth. Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were the regular ntohs()/ntohl() on a field that may not be aligned correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:10 +02:00			`flags = ntoh_s(ondisk->flags);`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`len = flags & CE_NAMEMASK;`
index: be careful when handling long names We currently use lower 12-bit (masked with CE_NAMEMASK) in the ce_flags field to store the length of the name in cache_entry, without checking the length parameter given to create_ce_flags(). This can make us store incorrect length. Currently we are mostly protected by the fact that many codepaths first copy the path in a variable of size PATH_MAX, which typically is 4096 that happens to match the limit, but that feels like a bug waiting to happen. Besides, that would not allow us to shorten the width of CE_NAMEMASK to use the bits for new flags. This redefines the meaning of the name length stored in the cache_entry. A name that does not fit is represented by storing CE_NAMEMASK in the field, and the actual length needs to be computed by actually counting the bytes in the name[] field. This way, only the unusually long paths need to suffer. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-19 08:42:00 +01:00
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`if (flags & CE_EXTENDED) {`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`struct ondisk_cache_entry_extended *ondisk2;`
			`int extended_flags;`
			`ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;`
read-cache.c: allow unaligned mapping of the index file Both the on-disk format v2 and v3 pads the "name" field to the multiple of eight to make sure that various quantities in network long/short type can be accessed with ntohl/ntohs without having to worry about alignment, but this forces us to waste disk I/O bandwidth. Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were the regular ntohs()/ntohl() on a field that may not be aligned correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:10 +02:00			`extended_flags = ntoh_s(ondisk2->flags2) << 16;`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */`
			`if (extended_flags & ~CE_EXTENDED_FLAGS)`
			`die("Unknown index entry format %08x", extended_flags);`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`flags \|= extended_flags;`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`name = ondisk2->name;`
			`}`
			`else`
			`name = ondisk->name;`

read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`if (!previous_name) {`
			`/* v3 and earlier */`
			`if (len == CE_NAMEMASK)`
			`len = strlen(name);`
			`ce = cache_entry_from_ondisk(ondisk, flags, name, len);`

			`*ent_size = ondisk_ce_size(ce);`
			`} else {`
			`unsigned long consumed;`
			`consumed = expand_name_field(previous_name, name);`
			`ce = cache_entry_from_ondisk(ondisk, flags,`
			`previous_name->buf,`
			`previous_name->len);`

			`ent_size = (name - ((char )ondisk)) + consumed;`
			`}`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`return ce;`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`}`

Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`/* remember to discard_cache() before reading a different cache! */`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int read_index_from(struct index_state istate, const char path)`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`{`
			`int fd, i;`
			`struct stat st;`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`unsigned long src_offset;`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`struct cache_header *hdr;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`void *mmap;`
			`size_t mmap_size;`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
unpack_trees(): protect the handcrafted in-core index from read_cache() unpack_trees() rebuilds the in-core index from scratch by allocating a new structure and finishing it off by copying the built one to the final index. The resulting in-core index is Ok for most use, but read_cache() does not recognize it as such. The function is meant to be no-op if you already have loaded the index, until you call discard_cache(). This change the way read_cache() detects an already initialized in-core index, by introducing an extra bit, and marks the handcrafted in-core index as initialized, to avoid this problem. A better fix in the longer term would be to change the read_cache() API so that it will always discard and re-read from the on-disk index to avoid confusion. But there are higher level API that have relied on the current semantics, and they and their users all need to get converted, which is outside the scope of 'maint' track. An example of such a higher level API is write_cache_as_tree(), which is used by git-write-tree as well as later Porcelains like git-merge, revert and cherry-pick. In the longer term, we should remove read_cache() from there and add one to cmd_write_tree(); other callers expect that the in-core index they prepared is what gets written as a tree so no other change is necessary for this particular codepath. The original version of this patch marked the index by pointing an otherwise wasted malloc'ed memory with o->result.alloc, but this version uses Linus's idea to use a new "initialized" bit, which is conceptually much cleaner. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-23 21:57:30 +02:00			`if (istate->initialized)`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`return istate->cache_nr;`
[PATCH] Better error reporting for "git status" Instead of "git status" ignoring (and hiding) potential errors from the "git-update-index" call, make it exit if it fails, and show the error. In order to do this, use the "-q" flag (to ignore not-up-to-date files) and add a new "--unmerged" flag that allows unmerged entries in the index without any errors. This also avoids marking the index "changed" if an entry isn't actually modified, and makes sure that we exit with an understandable error message if the index is corrupt or unreadable. "read_cache()" no longer returns an error for the caller to check. Finally, make die() and usage() exit with recognizable error codes, if we ever want to check the failure reason in scripts. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-01 22:24:27 +02:00
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`istate->timestamp.sec = 0;`
			`istate->timestamp.nsec = 0;`
Extract helper bits from c-merge-recursive work This backports the pieces that are not uncooked from the merge-recursive WIP we have seen earlier, to be used in git-mv rewritten in C. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-26 06:32:18 +02:00			`fd = open(path, O_RDONLY);`
[PATCH] Better error reporting for "git status" Instead of "git status" ignoring (and hiding) potential errors from the "git-update-index" call, make it exit if it fails, and show the error. In order to do this, use the "-q" flag (to ignore not-up-to-date files) and add a new "--unmerged" flag that allows unmerged entries in the index without any errors. This also avoids marking the index "changed" if an entry isn't actually modified, and makes sure that we exit with an understandable error message if the index is corrupt or unreadable. "read_cache()" no longer returns an error for the caller to check. Finally, make die() and usage() exit with recognizable error codes, if we ever want to check the failure reason in scripts. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-01 22:24:27 +02:00			`if (fd < 0) {`
			`if (errno == ENOENT)`
			`return 0;`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`die_errno("index file open failed");`
[PATCH] Better error reporting for "git status" Instead of "git status" ignoring (and hiding) potential errors from the "git-update-index" call, make it exit if it fails, and show the error. In order to do this, use the "-q" flag (to ignore not-up-to-date files) and add a new "--unmerged" flag that allows unmerged entries in the index without any errors. This also avoids marking the index "changed" if an entry isn't actually modified, and makes sure that we exit with an understandable error message if the index is corrupt or unreadable. "read_cache()" no longer returns an error for the caller to check. Finally, make die() and usage() exit with recognizable error codes, if we ever want to check the failure reason in scripts. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-01 22:24:27 +02:00			`}`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
read_cache_from(): small simplification This change 'opens' the code block which maps the index file into memory, making the code clearer and easier to read. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-25 16:18:17 +02:00			`if (fstat(fd, &st))`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`die_errno("cannot stat the open index");`
read_cache_from(): small simplification This change 'opens' the code block which maps the index file into memory, making the code clearer and easier to read. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-25 16:18:17 +02:00
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`mmap_size = xsize_t(st.st_size);`
			`if (mmap_size < sizeof(struct cache_header) + 20)`
read_cache_from(): small simplification This change 'opens' the code block which maps the index file into memory, making the code clearer and easier to read. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-25 16:18:17 +02:00			`die("index file smaller than expected");`

Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`mmap = xmmap(NULL, mmap_size, PROT_READ \| PROT_WRITE, MAP_PRIVATE, fd, 0);`
			`if (mmap == MAP_FAILED)`
Use die_errno() instead of die() when checking syscalls Lots of die() calls did not actually report the kind of error, which can leave the user confused as to the real problem. Use die_errno() where we check a system/library call that sets errno on failure, or one of the following that wrap such calls: Function Passes on error from -------- -------------------- odb_pack_keep open read_ancestry fopen read_in_full xread strbuf_read xread strbuf_read_file open or strbuf_read_file strbuf_readlink readlink write_in_full xwrite Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:47 +02:00			`die_errno("unable to map index file");`
read_index_from: remove bogus errno assignments These assignments comes from the very first commit e83c516 (Initial revision of "git", the information manager from hell - 2005-04-07). Back then we did not die() when errors happened so correct errno was required. Since 5d1a5c0 ([PATCH] Better error reporting for "git status" - 2005-10-01), read_index_from() learned to die rather than just return -1 and these assignments became irrelevant. Remove them. While at it, move die_errno() next to xmmap() call because it's the mmap's error code that we care about. Otherwise if close(fd); fails, it could overwrite mmap's errno. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-08-06 13:27:09 +02:00			`close(fd);`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`hdr = mmap;`
			`if (verify_hdr(hdr, mmap_size) < 0)`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`goto unmap;`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`istate->version = ntohl(hdr->hdr_version);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_nr = ntohl(hdr->hdr_entries);`
			`istate->cache_alloc = alloc_nr(istate->cache_nr);`
read-cache: trivial style cleanups Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-30 15:56:19 +02:00			`istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));`
unpack_trees(): protect the handcrafted in-core index from read_cache() unpack_trees() rebuilds the in-core index from scratch by allocating a new structure and finishing it off by copying the built one to the final index. The resulting in-core index is Ok for most use, but read_cache() does not recognize it as such. The function is meant to be no-op if you already have loaded the index, until you call discard_cache(). This change the way read_cache() detects an already initialized in-core index, by introducing an extra bit, and marks the handcrafted in-core index as initialized, to avoid this problem. A better fix in the longer term would be to change the read_cache() API so that it will always discard and re-read from the on-disk index to avoid confusion. But there are higher level API that have relied on the current semantics, and they and their users all need to get converted, which is outside the scope of 'maint' track. An example of such a higher level API is write_cache_as_tree(), which is used by git-write-tree as well as later Porcelains like git-merge, revert and cherry-pick. In the longer term, we should remove read_cache() from there and add one to cmd_write_tree(); other callers expect that the in-core index they prepared is what gets written as a tree so no other change is necessary for this particular codepath. The original version of this patch marked the index by pointing an otherwise wasted malloc'ed memory with o->result.alloc, but this version uses Linus's idea to use a new "initialized" bit, which is conceptually much cleaner. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-23 21:57:30 +02:00			`istate->initialized = 1;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`if (istate->version == 4)`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`previous_name = &previous_name_buf;`
			`else`
			`previous_name = NULL;`

Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`src_offset = sizeof(*hdr);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`for (i = 0; i < istate->cache_nr; i++) {`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`struct ondisk_cache_entry *disk_ce;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`struct cache_entry *ce;`
read-cache.c: make create_from_disk() report number of bytes it consumed The function is the one that is reading from the data stream. It only is natural to make it responsible for reporting this number, not the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:11 +02:00			`unsigned long consumed;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`disk_ce = (struct ondisk_cache_entry )((char )mmap + src_offset);`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`ce = create_from_disk(disk_ce, &consumed, previous_name);`
Create pathname-based hash-table lookup into index This creates a hash index of every single file added to the index. Right now that hash index isn't actually used for much: I implemented a "cache_name_exists()" function that uses it to efficiently look up a filename in the index without having to do the O(logn) binary search, but quite frankly, that's not why this patch is interesting. No, the whole and only reason to create the hash of the filenames in the index is that by modifying the hash function, you can fairly easily do things like making it always hash equivalent names into the same bucket. That, in turn, means that suddenly questions like "does this name exist in the index under an _equivalent_ name?" becomes much much cheaper. Guiding principles behind this patch: - it shouldn't be too costly. In fact, my primary goal here was to actually speed up "git commit" with a fully populated kernel tree, by being faster at checking whether a file already existed in the index. I did succeed, but only barely: Best before: [torvalds@woody linux]$ time git commit > /dev/null real 0m0.255s user 0m0.168s sys 0m0.088s Best after: [torvalds@woody linux]$ time ~/git/git commit > /dev/null real 0m0.233s user 0m0.144s sys 0m0.088s so some things are actually faster (~8%). Caveat: that's really the best case. Other things are invariably going to be slightly slower, since we populate that index cache, and quite frankly, few things really use it to look things up. That said, the cost is really quite small. The worst case is probably doing a "git ls-files", which will do very little except puopulate the index, and never actually looks anything up in it, just lists it. Before: [torvalds@woody linux]$ time git ls-files > /dev/null real 0m0.016s user 0m0.016s sys 0m0.000s After: [torvalds@woody linux]$ time ~/git/git ls-files > /dev/null real 0m0.021s user 0m0.012s sys 0m0.008s and while the thing has really gotten relatively much slower, we're still talking about something almost unmeasurable (eg 5ms). And that really should be pretty much the worst case. So we lose 5ms on one "benchmark", but win 22ms on another. Pick your poison - this patch has the advantage that it will _likely_ speed up the cases that are complex and expensive more than it slows down the cases that are already so fast that nobody cares. But if you look at relative speedups/slowdowns, it doesn't look so good. - It should be simple and clean The code may be a bit subtle (the reasons I do hash removal the way I do etc), but it re-uses the existing hash.c files, so it really is fairly small and straightforward apart from a few odd details. Now, this patch on its own doesn't really do much, but I think it's worth looking at, if only because if done correctly, the name hashing really can make an improvement to the whole issue of "do we have a filename that looks like this in the index already". And at least it gets real testing by being used even by default (ie there is a real use-case for it even without any insane filesystems). NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just not ashamed of it enough to really care. I took all the numbers out of my nether regions - I'm sure it's good enough that it works in practice, but the whole point was that you can make a really much fancier hash that hashes characters not directly, but by their upper-case value or something like that, and thus you get a case-insensitive hash, while still keeping the name and the index itself totally case sensitive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-23 03:41:14 +01:00			`set_index_entry(istate, i, ce);`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00
read-cache.c: make create_from_disk() report number of bytes it consumed The function is the one that is reading from the data stream. It only is natural to make it responsible for reporting this number, not the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:11 +02:00			`src_offset += consumed;`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`}`
read-cache.c: read prefix-compressed names in index on-disk version v4 Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:15 +02:00			`strbuf_release(&previous_name_buf);`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`istate->timestamp.sec = st.st_mtime;`
Record ns-timestamps if possible, but do not use it without USE_NSEC Traditionally, the lack of USE_NSEC meant "do not record nor use the nanosecond resolution part of the file timestamps". To avoid problems on filesystems that lose the ns part when the metadata is flushed to the disk and then later read back in, disabling USE_NSEC has been a good idea in general. If you are on a filesystem without such an issue, it does not hurt to read and store them in the cached stat data in the index entries even if your git is compiled without USE_NSEC. The index left with such a version of git can be read by git compiled with USE_NSEC and it can make use of the nanosecond part to optimize the check to see if the path on the filesystem hsa been modified since we last looked at. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-04 18:47:40 +01:00			`istate->timestamp.nsec = ST_MTIME_NSEC(st);`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`while (src_offset <= mmap_size - 20 - 8) {`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`/* After an array of active_nr index entries,`
			`* there can be arbitrary number of extended`
			`* sections, each of which is prefixed with`
			`* extension name (4-byte) and section length`
			`* in 4-byte network byte order.`
			`*/`
read_index(): fix reading extension size on BE 64-bit archs On big endian platforms with 8-byte unsigned long, the code reads the size of the index extension section (which is a 4-byte network byte order integer) incorrectly. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-27 07:11:21 +01:00			`uint32_t extsize;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`memcpy(&extsize, (char *)mmap + src_offset + 4, 4);`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`extsize = ntohl(extsize);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`if (read_index_extension(istate,`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`(const char *) mmap + src_offset,`
			`(char *) mmap + src_offset + 8,`
Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:09 +02:00			`extsize) < 0)`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`goto unmap;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`src_offset += 8;`
			`src_offset += extsize;`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`}`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`munmap(mmap, mmap_size);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`return istate->cache_nr;`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00
			`unmap:`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`munmap(mmap, mmap_size);`
[PATCH] Better error reporting for "git status" Instead of "git status" ignoring (and hiding) potential errors from the "git-update-index" call, make it exit if it fails, and show the error. In order to do this, use the "-q" flag (to ignore not-up-to-date files) and add a new "--unmerged" flag that allows unmerged entries in the index without any errors. This also avoids marking the index "changed" if an entry isn't actually modified, and makes sure that we exit with an understandable error message if the index is corrupt or unreadable. "read_cache()" no longer returns an error for the caller to check. Finally, make die() and usage() exit with recognizable error codes, if we ever want to check the failure reason in scripts. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-10-01 22:24:27 +02:00			`die("index file corrupt");`
Initial revision of "git", the information manager from hell 2005-04-08 00:13:13 +02:00			`}`

checkout: Fix "initial checkout" detection Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07) tightened the rule to prevent switching branches from losing local changes, so that staged removal of paths can be protected, while attempting to keep a loophole to still allow a special case of switching out of an un-checked-out state. However, the loophole was made a bit too tight, and did not allow switching from one branch (in an un-checked-out state) to check out another branch. The change to builtin-checkout.c in this commit loosens it to allow this, by not insisting the original commit and the new commit to be the same. It also introduces a new function, is_index_unborn (and an associated macro, is_cache_unborn), to check if the repository is truly in an un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE did not exist when populating the in-core index structure. A few places the earlier commit 5521883 added the check for the initial checkout condition are updated to use this function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-11-12 20:52:35 +01:00			`int is_index_unborn(struct index_state *istate)`
			`{`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`return (!istate->cache_nr && !istate->timestamp.sec);`
checkout: Fix "initial checkout" detection Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07) tightened the rule to prevent switching branches from losing local changes, so that staged removal of paths can be protected, while attempting to keep a loophole to still allow a special case of switching out of an un-checked-out state. However, the loophole was made a bit too tight, and did not allow switching from one branch (in an un-checked-out state) to check out another branch. The change to builtin-checkout.c in this commit loosens it to allow this, by not insisting the original commit and the new commit to be the same. It also introduces a new function, is_index_unborn (and an associated macro, is_cache_unborn), to check if the repository is truly in an un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE did not exist when populating the in-core index structure. A few places the earlier commit 5521883 added the check for the initial checkout condition are updated to use this function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-11-12 20:52:35 +01:00			`}`

Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`int discard_index(struct index_state *istate)`
Status update on merge-recursive in C This is just an update for people being interested. Alex and me were busy with that project for a few days now. While it has progressed nicely, there are quite a couple TODOs in merge-recursive.c, just search for "TODO". For impatient people: yes, it passes all the tests, and yes, according to the evil test Alex did, it is faster than the Python script. But no, it is not yet finished. Biggest points are: - there are still three external calls - in the end, it should not be necessary to write the index more than once (just before exiting) - a lot of things can be refactored to make the code easier and shorter BTW we cannot just plug in git-merge-tree yet, because git-merge-tree does not handle renames at all. This patch is meant for testing, and as such, - it compile the program to git-merge-recur - it adjusts the scripts and tests to use git-merge-recur instead of git-merge-recursive - it provides "TEST", a script to execute the tests regarding -recursive - it inlines the changes to read-cache.c (read_cache_from(), discard_cache() and refresh_cache_entry()) Brought to you by Alex Riesen and Dscho Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-08 18:42:41 +02:00			`{`
read-cache.c: allocate index entries individually The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-24 23:59:14 +02:00			`int i;`

			`for (i = 0; i < istate->cache_nr; i++)`
			`free(istate->cache[i]);`
resolve-undo: record resolved conflicts in a new index extension section When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-25 09:30:51 +01:00			`resolve_undo_clear_index(istate);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`istate->cache_nr = 0;`
			`istate->cache_changed = 0;`
make USE_NSEC work as expected Since the filesystem ext4 is now defined as stable in Linux v2.6.28, and ext4 supports nanonsecond resolution timestamps natively, it is time to make USE_NSEC work as expected. This will make racy git situations less likely to happen. For 'git checkout' this means it will be less likely that we have to open, read the contents of the file into RAM, and check if file is really modified or not. The result sould be a litle less used CPU time, less pagefaults and a litle faster program, at least for 'git checkout'. Since the number of possible racy git situations would increase when disks gets faster, this patch would be more and more helpfull as times go by. For a fast Solid State Disk, this patch should be helpfull. Note that, when file operations starts to take less than 1 nanosecond, one would again start to get more racy git situations. For more info on racy git, see Documentation/technical/racy-git.txt For more info on ext4, see http://kernelnewbies.org/Ext4 Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-19 21:08:29 +01:00			`istate->timestamp.sec = 0;`
			`istate->timestamp.nsec = 0;`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`free_name_hash(istate);`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`cache_tree_free(&(istate->cache_tree));`
unpack_trees(): protect the handcrafted in-core index from read_cache() unpack_trees() rebuilds the in-core index from scratch by allocating a new structure and finishing it off by copying the built one to the final index. The resulting in-core index is Ok for most use, but read_cache() does not recognize it as such. The function is meant to be no-op if you already have loaded the index, until you call discard_cache(). This change the way read_cache() detects an already initialized in-core index, by introducing an extra bit, and marks the handcrafted in-core index as initialized, to avoid this problem. A better fix in the longer term would be to change the read_cache() API so that it will always discard and re-read from the on-disk index to avoid confusion. But there are higher level API that have relied on the current semantics, and they and their users all need to get converted, which is outside the scope of 'maint' track. An example of such a higher level API is write_cache_as_tree(), which is used by git-write-tree as well as later Porcelains like git-merge, revert and cherry-pick. In the longer term, we should remove read_cache() from there and add one to cmd_write_tree(); other callers expect that the in-core index they prepared is what gets written as a tree so no other change is necessary for this particular codepath. The original version of this patch marked the index by pointing an otherwise wasted malloc'ed memory with o->result.alloc, but this version uses Linus's idea to use a new "initialized" bit, which is conceptually much cleaner. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-23 21:57:30 +02:00			`istate->initialized = 0;`
read-cache: free cache in discard_index discard_cache doesn't have to free the array of cache entries, because the next call of read_cache can simply reuse it, as they all operate on the global variable the_index. discard_index on the other hand does have to free it, because it can be used e.g. with index_state variables on the stack, in which case a missing free would cause an unrecoverable leak. This patch releases the memory and removes a comment that was relevant for discard_cache but has become outdated. Since discard_cache is just a wrapper around discard_index nowadays, we lose the optimization that avoids reallocation of that array within loops of read_cache and discard_cache. That doesn't cause a performance regression for me, however (HEAD = this patch, HEAD^ = master + p0002): Test // HEAD^ HEAD ---------------\\----------------------------------------------------- 0002.1: read_ca// 1000 times 0.62(0.58+0.04) 0.61(0.58+0.02) -1.6% Suggested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-09 19:39:18 +02:00			`free(istate->cache);`
			`istate->cache = NULL;`
			`istate->cache_alloc = 0;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`return 0;`
Status update on merge-recursive in C This is just an update for people being interested. Alex and me were busy with that project for a few days now. While it has progressed nicely, there are quite a couple TODOs in merge-recursive.c, just search for "TODO". For impatient people: yes, it passes all the tests, and yes, according to the evil test Alex did, it is faster than the Python script. But no, it is not yet finished. Biggest points are: - there are still three external calls - in the end, it should not be necessary to write the index more than once (just before exiting) - a lot of things can be refactored to make the code easier and shorter BTW we cannot just plug in git-merge-tree yet, because git-merge-tree does not handle renames at all. This patch is meant for testing, and as such, - it compile the program to git-merge-recur - it adjusts the scripts and tests to use git-merge-recur instead of git-merge-recursive - it provides "TEST", a script to execute the tests regarding -recursive - it inlines the changes to read-cache.c (read_cache_from(), discard_cache() and refresh_cache_entry()) Brought to you by Alex Riesen and Dscho Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-08 18:42:41 +02:00			`}`

Add 'const' where appropriate to index handling functions This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-06 21:46:09 +01:00			`int unmerged_index(const struct index_state *istate)`
Library function to check for unmerged index entries It's small, but it was in three places already, so it should be in the library. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> 2008-02-07 17:40:13 +01:00			`{`
			`int i;`
			`for (i = 0; i < istate->cache_nr; i++) {`
			`if (ce_stage(istate->cache[i]))`
			`return 1;`
			`}`
			`return 0;`
			`}`

Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`#define WRITE_BUFFER_SIZE 8192`
[PATCH] Kill a bunch of pointer sign warnings for gcc4 - Raw hashes should be unsigned char. - String functions want signed char. - Hash and compress functions want unsigned char. Signed-off By: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-18 14:14:09 +02:00			`static unsigned char write_buffer[WRITE_BUFFER_SIZE];`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`static unsigned long write_buffer_len;`

fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static int ce_write_flush(git_SHA_CTX *context, int fd)`
read-cache: tweak racy-git delay logic Instead of looping over the entries and writing out, use a separate loop after all entries have been written out to check how many entries are racily clean. Make sure that the newly created index file gets the right timestamp when we check by flushing the buffered data by ce_write(). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-08 23:47:32 +02:00			`{`
			`unsigned int buffered = write_buffer_len;`
			`if (buffered) {`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(context, write_buffer, buffered);`
short i/o: fix calls to write to use xwrite or write_in_full We have a number of badly checked write() calls. Often we are expecting write() to write exactly the size we requested or fail, this fails to handle interrupts or short writes. Switch to using the new write_in_full(). Otherwise we at a minimum need to check for EINTR and EAGAIN, where this is appropriate use xwrite(). Note, the changes to config handling are much larger and handled in the next patch in the sequence. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-08 16:58:23 +01:00			`if (write_in_full(fd, write_buffer, buffered) != buffered)`
read-cache: tweak racy-git delay logic Instead of looping over the entries and writing out, use a separate loop after all entries have been written out to check how many entries are racily clean. Make sure that the newly created index file gets the right timestamp when we check by flushing the buffered data by ce_write(). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-08 23:47:32 +02:00			`return -1;`
			`write_buffer_len = 0;`
			`}`
			`return 0;`
			`}`

fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static int ce_write(git_SHA_CTX context, int fd, void data, unsigned int len)`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`{`
			`while (len) {`
			`unsigned int buffered = write_buffer_len;`
			`unsigned int partial = WRITE_BUFFER_SIZE - buffered;`
			`if (partial > len)`
			`partial = len;`
			`memcpy(write_buffer + buffered, data, partial);`
			`buffered += partial;`
			`if (buffered == WRITE_BUFFER_SIZE) {`
read-cache: tweak racy-git delay logic Instead of looping over the entries and writing out, use a separate loop after all entries have been written out to check how many entries are racily clean. Make sure that the newly created index file gets the right timestamp when we check by flushing the buffered data by ce_write(). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-08 23:47:32 +02:00			`write_buffer_len = buffered;`
			`if (ce_write_flush(context, fd))`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`return -1;`
			`buffered = 0;`
			`}`
			`write_buffer_len = buffered;`
			`len -= partial;`
Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:09 +02:00			`data = (char *) data + partial;`
War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-07 09:04:01 +02:00			`}`
			`return 0;`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`}`

fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static int write_index_ext_header(git_SHA_CTX *context, int fd,`
git-write-tree writes garbage on sparc64 In the "next" branch, write_index_ext_header() writes garbage on a 64-bit big-endian machine; the written index file will be unreadable. I noticed this on NetBSD/sparc64. Reproducible with: $ git init-db $ :>file $ git-update-index --add file $ git-write-tree $ git-update-index error: index uses extension, which we do not understand fatal: index file corrupt Signed-off-by: Dennis Stosberg <dennis@stosberg.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-28 21:08:08 +02:00			`unsigned int ext, unsigned int sz)`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`{`
			`ext = htonl(ext);`
			`sz = htonl(sz);`
read-cache.c cleanup Removes conditional returns. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-14 22:38:14 +02:00			`return ((ce_write(context, fd, &ext, 4) < 0) \|\|`
			`(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`}`

fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`static int ce_flush(git_SHA_CTX *context, int fd)`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`{`
			`unsigned int left = write_buffer_len;`
Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`if (left) {`
			`write_buffer_len = 0;`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(context, write_buffer, left);`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`}`
Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00
[PATCH] Fix buffer overflow in ce_flush(). Add a check before appending SHA1 signature to write_buffer, flush it first if necessary. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-11 15:27:47 +02:00			`/* Flush first if not enough space for SHA1 signature */`
			`if (left + 20 > WRITE_BUFFER_SIZE) {`
short i/o: fix calls to write to use xwrite or write_in_full We have a number of badly checked write() calls. Often we are expecting write() to write exactly the size we requested or fail, this fails to handle interrupts or short writes. Switch to using the new write_in_full(). Otherwise we at a minimum need to check for EINTR and EAGAIN, where this is appropriate use xwrite(). Note, the changes to config handling are much larger and handled in the next patch in the sequence. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-08 16:58:23 +01:00			`if (write_in_full(fd, write_buffer, left) != left)`
[PATCH] Fix buffer overflow in ce_flush(). Add a check before appending SHA1 signature to write_buffer, flush it first if necessary. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-09-11 15:27:47 +02:00			`return -1;`
			`left = 0;`
			`}`

Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00			`/* Append the SHA1 signature at the end */`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Final(write_buffer + left, context);`
Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00			`left += 20;`
short i/o: fix calls to write to use xwrite or write_in_full We have a number of badly checked write() calls. Often we are expecting write() to write exactly the size we requested or fail, this fails to handle interrupts or short writes. Switch to using the new write_in_full(). Otherwise we at a minimum need to check for EINTR and EAGAIN, where this is appropriate use xwrite(). Note, the changes to config handling are much larger and handled in the next patch in the sequence. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-08 16:58:23 +01:00			`return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;`
Speed up index file writing by chunking it nicely. No point in making 17,000 small writes when you can make just a couple of hundred nice 8kB writes instead and save a lot of time. 2005-04-20 21:16:57 +02:00			`}`

Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`static void ce_smudge_racily_clean_entry(struct cache_entry *ce)`
			`{`
			`/*`
			`* The only thing we care about in this function is to smudge the`
			`* falsely clean entry due to touch-update-touch race, so we leave`
			`* everything else as they are. We are called for entries whose`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`* ce_stat_data.sd_mtime match the index file mtime.`
Teach gitlinks to ie_modified() and ce_modified_check_fs() The ie_modified() function is the workhorse for refresh_cache_entry(), i.e. checking if an index entry that is stat-dirty actually has changes. After running quicker check to compare cached stat information with results from the latest lstat(2) to answer "has modification" early, the code goes on to check if there really is a change by comparing the staged data with what is on the filesystem by asking ce_modified_check_fs(). However, this function always said "no change" for any gitlinks that has a directory at the corresponding path. This made ie_modified() to miss actual changes in the subproject. The patch fixes this first by modifying an existing short-circuit logic before calling the ce_modified_check_fs() function. It knows that for any filesystem entity to which ie_match_stat() says its data has changed, if its cached size is nonzero then the contents cannot match, which is a correct optimization only for blob objects. We teach gitlink objects to this special case, as we already know that any gitlink that ie_match_stat() says is modified is indeed modified at this point in the codepath. With the above change, we could leave ce_modified_check_fs() broken, but it also futureproofs the code by teaching it to use ce_compare_gitlink(), instead of assuming (incorrectly) that any directory is unchanged. Originally noticed by Alex Riesen on Cygwin. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-29 10:13:44 +02:00			`*`
			`* Note that this actually does not do much for gitlinks, for`
			`* which ce_match_stat_basic() always goes to the actual`
			`* contents. The caller checks with is_racy_timestamp() which`
			`* always says "no" for gitlinks, so we are not called for them ;-)`
Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`*/`
			`struct stat st;`

			`if (lstat(ce->name, &st) < 0)`
			`return;`
			`if (ce_match_stat_basic(ce, &st))`
			`return;`
			`if (ce_modified_check_fs(ce, &st)) {`
ce_smudge_racily_clean_entry: explain why it works. This is a tricky code and warrants extra commenting. I wasted 30 minutes trying to break it until I realized why it works. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 23:18:47 +01:00			`/* This is "racily clean"; smudge it. Note that this`
			`* is a tricky code. At first glance, it may appear`
			`* that it can break with this sequence:`
			`*`
			`* $ echo xyzzy >frotz`
			`* $ git-update-index --add frotz`
			`* $ : >frotz`
			`* $ sleep 3`
			`* $ echo filfre >nitfol`
			`* $ git-update-index --add nitfol`
			`*`
Racy git: avoid having to be always too careful Immediately after a bulk checkout, most of the paths in the working tree would have the same timestamp as the index file, and this would force ce_match_stat() to take slow path for all of them. When writing an index file out, if many of the paths have very new (read: the same timestamp as the index file being written out) timestamp, we are better off delaying the return from the command, to make sure that later command to touch the working tree files will leave newer timestamps than recorded in the index, thereby avoiding to take the slow path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-05 13:16:02 +02:00			`* but it does not. When the second update-index runs,`
ce_smudge_racily_clean_entry: explain why it works. This is a tricky code and warrants extra commenting. I wasted 30 minutes trying to break it until I realized why it works. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 23:18:47 +01:00			`* it notices that the entry "frotz" has the same timestamp`
			`* as index, and if we were to smudge it by resetting its`
			`* size to zero here, then the object name recorded`
			`* in index is the 6-byte file but the cached stat information`
			`* becomes zero --- which would then match what we would`
War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-07 09:04:01 +02:00			`* obtain from the filesystem next time we stat("frotz").`
ce_smudge_racily_clean_entry: explain why it works. This is a tricky code and warrants extra commenting. I wasted 30 minutes trying to break it until I realized why it works. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 23:18:47 +01:00			`*`
			`* However, the second update-index, before calling`
			`* this function, notices that the cached size is 6`
			`* bytes and what is on the filesystem is an empty`
			`* file, and never calls us, so the cached size information`
			`* for "frotz" stays 6 which does not match the filesystem.`
			`*/`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`ce->ce_stat_data.sd_size = 0;`
Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`}`
			`}`

read-cache.c: move code to copy incore to ondisk cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:14 +02:00			`/* Copy miscellaneous fields but not the name */`
			`static char copy_cache_entry_to_ondisk(struct ondisk_cache_entry ondisk,`
			`struct cache_entry *ce)`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`{`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`short flags;`

Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);`
			`ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);`
			`ondisk->ctime.nsec = htonl(ce->ce_stat_data.sd_ctime.nsec);`
			`ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);`
			`ondisk->dev = htonl(ce->ce_stat_data.sd_dev);`
			`ondisk->ino = htonl(ce->ce_stat_data.sd_ino);`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`ondisk->mode = htonl(ce->ce_mode);`
Extract a struct stat_data from cache_entry Add public functions fill_stat_data() and match_stat_data() to work with it. This infrastructure will later be used to check the validity of other types of file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:50 +02:00			`ondisk->uid = htonl(ce->ce_stat_data.sd_uid);`
			`ondisk->gid = htonl(ce->ce_stat_data.sd_gid);`
			`ondisk->size = htonl(ce->ce_stat_data.sd_size);`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`hashcpy(ondisk->sha1, ce->sha1);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00
			`flags = ce->ce_flags;`
			`flags \|= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));`
			`ondisk->flags = htons(flags);`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`if (ce->ce_flags & CE_EXTENDED) {`
			`struct ondisk_cache_entry_extended *ondisk2;`
			`ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;`
			`ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);`
read-cache.c: move code to copy incore to ondisk cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:14 +02:00			`return ondisk2->name;`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`}`
read-cache.c: move code to copy incore to ondisk cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:14 +02:00			`else {`
			`return ondisk->name;`
			`}`
			`}`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`static int ce_write_entry(git_SHA_CTX c, int fd, struct cache_entry ce,`
			`struct strbuf *previous_name)`
read-cache.c: move code to copy incore to ondisk cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:14 +02:00			`{`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`int size;`
			`struct ondisk_cache_entry *ondisk;`
read-cache.c: move code to copy incore to ondisk cache to a helper function This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 00:53:14 +02:00			`char *name;`
			`int result;`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`if (!previous_name) {`
			`size = ondisk_ce_size(ce);`
			`ondisk = xcalloc(1, size);`
			`name = copy_cache_entry_to_ondisk(ondisk, ce);`
			`memcpy(name, ce->name, ce_namelen(ce));`
			`} else {`
			`int common, to_remove, prefix_size;`
			`unsigned char to_remove_vi[16];`
			`for (common = 0;`
			`(ce->name[common] &&`
			`common < previous_name->len &&`
			`ce->name[common] == previous_name->buf[common]);`
			`common++)`
			`; /* still matching */`
			`to_remove = previous_name->len - common;`
			`prefix_size = encode_varint(to_remove, to_remove_vi);`

			`if (ce->ce_flags & CE_EXTENDED)`
			`size = offsetof(struct ondisk_cache_entry_extended, name);`
			`else`
			`size = offsetof(struct ondisk_cache_entry, name);`
			`size += prefix_size + (ce_namelen(ce) - common + 1);`

			`ondisk = xcalloc(1, size);`
			`name = copy_cache_entry_to_ondisk(ondisk, ce);`
			`memcpy(name, to_remove_vi, prefix_size);`
			`memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common);`

			`strbuf_splice(previous_name, common, to_remove,`
			`ce->name + common, ce_namelen(ce) - common);`
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`}`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00
core: Stop leaking ondisk_cache_entrys Noticed with valgrind. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-10 05:28:07 +02:00			`result = ce_write(c, fd, ondisk, size);`
			`free(ondisk);`
			`return result;`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`}`

update $GIT_INDEX_FILE when there are racily clean entries Traditional "opportunistic index update" done by read-only "diff" and "status" was about updating cached lstat(2) information in the index for the next round. We missed another obvious optimization opportunity: when there are racily clean entries that will cease to be racily clean by updating $GIT_INDEX_FILE. Detect that case and write $GIT_INDEX_FILE out to give it a newer timestamp. Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and counting the number of open(2) calls. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-21 18:18:19 +01:00			`static int has_racy_timestamp(struct index_state *istate)`
			`{`
			`int entries = istate->cache_nr;`
			`int i;`

			`for (i = 0; i < entries; i++) {`
			`struct cache_entry *ce = istate->cache[i];`
			`if (is_racy_timestamp(istate, ce))`
			`return 1;`
			`}`
			`return 0;`
			`}`

diff/status: refactor opportunistic index update When we had to refresh the index internally before running diff or status, we opportunistically updated the $GIT_INDEX_FILE so that later invocation of git can use the lstat(2) we already did in this invocation. Make them share a helper function to do so. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-21 18:16:10 +01:00			`/*`
many small typofixes Signed-off-by: Ondřej Bílka <neleai@seznam.cz> Reviewed-by: Marc Branchaud <marcnarc@xiplink.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-29 10:18:21 +02:00			`* Opportunistically update the index but do not complain if we can't`
diff/status: refactor opportunistic index update When we had to refresh the index internally before running diff or status, we opportunistically updated the $GIT_INDEX_FILE so that later invocation of git can use the lstat(2) we already did in this invocation. Make them share a helper function to do so. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-21 18:16:10 +01:00			`*/`
			`void update_index_if_able(struct index_state istate, struct lock_file lockfile)`
			`{`
update $GIT_INDEX_FILE when there are racily clean entries Traditional "opportunistic index update" done by read-only "diff" and "status" was about updating cached lstat(2) information in the index for the next round. We missed another obvious optimization opportunity: when there are racily clean entries that will cease to be racily clean by updating $GIT_INDEX_FILE. Detect that case and write $GIT_INDEX_FILE out to give it a newer timestamp. Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and counting the number of open(2) calls. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-21 18:18:19 +01:00			`if ((istate->cache_changed \|\| has_racy_timestamp(istate)) &&`
diff/status: refactor opportunistic index update When we had to refresh the index internally before running diff or status, we opportunistically updated the $GIT_INDEX_FILE so that later invocation of git can use the lstat(2) we already did in this invocation. Make them share a helper function to do so. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-03-21 18:16:10 +01:00			`!write_index(istate, lockfile->fd))`
			`commit_locked_index(lockfile);`
			`else`
			`rollback_lock_file(lockfile);`
			`}`

write_index(): update index_state->timestamp after flushing to disk Since this timestamp is used to check for racy-clean files, it is important to keep it uptodate. For the 'git checkout' command without the '-q' option, this make a huge difference. Before, each and every file which was updated, was racy-clean after the call to unpack_trees() and write_index() but before the GIT process ended. And because of the call to show_local_changes() in builtin-checkout.c, we ended up reading those files back into memory, doing a SHA1 to check if the files was really different from the index. And, of course, no file was different. With this fix, 'git checkout' without the '-q' option should now be almost as fast as with the '-q' option, but not quite, as we still do some few lstat(2) calls more without the '-q' option. Below is some average numbers for 10 checkout's to v2.6.27 and 10 to v2.6.25 of the Linux kernel, to show the difference: before (git version 1.6.2.rc1.256.g58a87): 7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor after: 6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-23 19:02:57 +01:00			`int write_index(struct index_state *istate, int newfd)`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`{`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA_CTX c;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`struct cache_header hdr;`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`int i, err, removed, extended, hdr_version;`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`struct cache_entry **cache = istate->cache;`
			`int entries = istate->cache_nr;`
write_index(): update index_state->timestamp after flushing to disk Since this timestamp is used to check for racy-clean files, it is important to keep it uptodate. For the 'git checkout' command without the '-q' option, this make a huge difference. Before, each and every file which was updated, was racy-clean after the call to unpack_trees() and write_index() but before the GIT process ended. And because of the call to show_local_changes() in builtin-checkout.c, we ended up reading those files back into memory, doing a SHA1 to check if the files was really different from the index. And, of course, no file was different. With this fix, 'git checkout' without the '-q' option should now be almost as fast as with the '-q' option, but not quite, as we still do some few lstat(2) calls more without the '-q' option. Below is some average numbers for 10 checkout's to v2.6.27 and 10 to v2.6.25 of the Linux kernel, to show the difference: before (git version 1.6.2.rc1.256.g58a87): 7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor after: 6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-23 19:02:57 +01:00			`struct stat st;`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;`
[PATCH] Bugfix: read-cache.c:write_cache() misrecords number of entries. When we choose to omit deleted entries, we should subtract numbers of such entries from the total number in the header. Signed-off-by: Junio C Hamano <junkio@cox.net> Oops. Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-10 10:32:37 +02:00
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`for (i = removed = extended = 0; i < entries; i++) {`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (cache[i]->ce_flags & CE_REMOVE)`
[PATCH] Bugfix: read-cache.c:write_cache() misrecords number of entries. When we choose to omit deleted entries, we should subtract numbers of such entries from the total number in the header. Signed-off-by: Junio C Hamano <junkio@cox.net> Oops. Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-10 10:32:37 +02:00			`removed++;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00
Extend index to save more flags The on-disk format of index only saves 16 bit flags, nearly all have been used. The last bit (CE_EXTENDED) is used to for future extension. This patch extends index entry format to save more flags in future. The new entry format will be used when CE_EXTENDED bit is 1. Because older implementation may not understand CE_EXTENDED bit and misread the new format, if there is any extended entry in index, index header version will turn 3, which makes it incompatible for older git. If there is none, header version will return to 2 again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 06:04:01 +02:00			`/* reduce extended entries if possible */`
			`cache[i]->ce_flags &= ~CE_EXTENDED;`
			`if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {`
			`extended++;`
			`cache[i]->ce_flags \|= CE_EXTENDED;`
			`}`
			`}`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`if (!istate->version)`
			`istate->version = INDEX_FORMAT_DEFAULT;`

			`/* demote version 3 to version 2 when the latter suffices */`
			`if (istate->version == 3 \|\| istate->version == 2)`
			`istate->version = extended ? 3 : 2;`

			`hdr_version = istate->version;`

Convert the index file reading/writing to use network byte order. This allows using a git tree over NFS with different byte order, and makes it possible to just copy a fully populated repository and have the end result immediately usable (needing just a refresh to update the stat information). 2005-04-15 19:44:27 +02:00			`hdr.hdr_signature = htonl(CACHE_SIGNATURE);`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`hdr.hdr_version = htonl(hdr_version);`
[PATCH] Bugfix: read-cache.c:write_cache() misrecords number of entries. When we choose to omit deleted entries, we should subtract numbers of such entries from the total number in the header. Signed-off-by: Junio C Hamano <junkio@cox.net> Oops. Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-10 10:32:37 +02:00			`hdr.hdr_entries = htonl(entries - removed);`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Init(&c);`
Make the sha1 of the index file go at the very end of the file. This allows us to both calculate it and verify it faster. 2005-04-20 21:36:41 +02:00			`if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`return -1;`

read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`for (i = 0; i < entries; i++) {`
			`struct cache_entry *ce = cache[i];`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`if (ce->ce_flags & CE_REMOVE)`
git-read-tree: remove deleted files in the working directory Only when "-u" is used of course. 2005-06-10 00:34:04 +02:00			`continue;`
write_index(): optimize ce_smudge_racily_clean_entry() calls with CE_UPTODATE When writing the index out, we need to check the work tree again to see if an entry whose timestamp indicates that it could be "racily clean", in order to smudge it if it is stat-clean but with modified contents. However, we can skip this step for entries marked with CE_UPTODATE, which are known to be the really clean (i.e. the one we already have checked when we prepared the index). This will reduce lstat(2) calls necessary in git-status. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-30 18:25:52 +02:00			`if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))`
Racy GIT (part #2) The previous round caught the most trivial case well, but broke down once index file is updated again. Smudge problematic entries (they should be very few if any under normal interactive workflow) before writing a new index file out. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 21:12:18 +01:00			`ce_smudge_racily_clean_entry(ce);`
write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-27 22:41:12 +02:00			`if (is_null_sha1(ce->sha1)) {`
			`static const char msg[] = "cache entry has null sha1: %s";`
			`static int allow = -1;`

			`if (allow < 0)`
			`allow = git_env_bool("GIT_ALLOW_NULL_SHA1", 0);`
			`if (allow)`
			`warning(msg, ce->name);`
			`else`
			`return error(msg, ce->name);`
			`}`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`if (ce_write_entry(&c, newfd, ce, previous_name) < 0)`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`return -1;`
			`}`
read-cache.c: write prefix-compressed names in the index Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-04-04 18:12:43 +02:00			`strbuf_release(&previous_name_buf);`
read-cache/write-cache: optionally return cache checksum SHA1. read_cache_1() and write_cache_1() takes an extra parameter *sha1 that returns the checksum of the index file when non-NULL. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-24 01:52:08 +02:00
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`/* Write extension data here */`
Make read-cache.c "the_index" free. This makes all low-level functions defined in read-cache.c to take an explicit index_state structure as their first parameter, to specify which index to work on. These functions traditionally operated on "the_index" and were named foo_cache(); the counterparts this patch introduces are called foo_index(). The traditional foo_cache() functions are made into macros that give "the_index" to their corresponding foo_index() functions. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-02 08:26:07 +02:00			`if (istate->cache_tree) {`
Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-09 21:12:12 +02:00			`struct strbuf sb = STRBUF_INIT;`
Small cache_tree_write refactor. This function cannot fail, make it void. Also make write_one act on a const char* instead of a char*. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-25 10:22:44 +02:00
			`cache_tree_write(&sb, istate->cache_tree);`
			`err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0`
			`\|\| ce_write(&c, newfd, sb.buf, sb.len) < 0;`
			`strbuf_release(&sb);`
			`if (err)`
index: make the index file format extensible. ... and move the cache-tree data into it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-04-25 06:18:58 +02:00			`return -1;`
			`}`
resolve-undo: record resolved conflicts in a new index extension section When resolving a conflict using "git add" to create a stage #0 entry, or "git rm" to remove entries at higher stages, remove_index_entry_at() function is eventually called to remove unmerged (i.e. higher stage) entries from the index. Introduce a "resolve_undo_info" structure and keep track of the removed cache entries, and save it in a new index extension section in the index_state. Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and "reset" are signs that recorded information in the index is no longer necessary. The data is removed from the index extension when operations start; they may leave conflicted entries in the index, and later user actions like "git add" will record their conflicted states afresh. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-25 09:30:51 +01:00			`if (istate->resolve_undo) {`
			`struct strbuf sb = STRBUF_INIT;`

			`resolve_undo_write(&sb, istate->resolve_undo);`
			`err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,`
			`sb.len) < 0`
			`\|\| ce_write(&c, newfd, sb.buf, sb.len) < 0;`
			`strbuf_release(&sb);`
			`if (err)`
			`return -1;`
			`}`
write_index(): update index_state->timestamp after flushing to disk Since this timestamp is used to check for racy-clean files, it is important to keep it uptodate. For the 'git checkout' command without the '-q' option, this make a huge difference. Before, each and every file which was updated, was racy-clean after the call to unpack_trees() and write_index() but before the GIT process ended. And because of the call to show_local_changes() in builtin-checkout.c, we ended up reading those files back into memory, doing a SHA1 to check if the files was really different from the index. And, of course, no file was different. With this fix, 'git checkout' without the '-q' option should now be almost as fast as with the '-q' option, but not quite, as we still do some few lstat(2) calls more without the '-q' option. Below is some average numbers for 10 checkout's to v2.6.27 and 10 to v2.6.25 of the Linux kernel, to show the difference: before (git version 1.6.2.rc1.256.g58a87): 7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor after: 6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-23 19:02:57 +01:00
			`if (ce_flush(&c, newfd) \|\| fstat(newfd, &st))`
			`return -1;`
checkout bugfix: use stat.mtime instead of stat.ctime in two places Commit e1afca4fd "write_index(): update index_state->timestamp after flushing to disk" on 2009-02-23 used stat.ctime to record the timestamp of the index-file. This is wrong, so fix this and use the correct stat.mtime timestamp instead. Commit 110c46a909 "Not all systems use st_[cm]tim field for ns resolution file timestamp" on 2009-03-08, has a similar bug for the builtin-fetch-pack.c file. Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-15 12:38:55 +01:00			`istate->timestamp.sec = (unsigned int)st.st_mtime;`
			`istate->timestamp.nsec = ST_MTIME_NSEC(st);`
write_index(): update index_state->timestamp after flushing to disk Since this timestamp is used to check for racy-clean files, it is important to keep it uptodate. For the 'git checkout' command without the '-q' option, this make a huge difference. Before, each and every file which was updated, was racy-clean after the call to unpack_trees() and write_index() but before the GIT process ended. And because of the call to show_local_changes() in builtin-checkout.c, we ended up reading those files back into memory, doing a SHA1 to check if the files was really different from the index. And, of course, no file was different. With this fix, 'git checkout' without the '-q' option should now be almost as fast as with the '-q' option, but not quite, as we still do some few lstat(2) calls more without the '-q' option. Below is some average numbers for 10 checkout's to v2.6.27 and 10 to v2.6.25 of the Linux kernel, to show the difference: before (git version 1.6.2.rc1.256.g58a87): 7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor after: 6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-02-23 19:02:57 +01:00			`return 0;`
Make "write_cache()" and friends available as generic routines. This is needed for the change to make "read-tree" just read into the cache (and then you do a "checkout-cache" to update your current dir contents). 2005-04-09 21:09:27 +02:00			`}`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00
			`/*`
			`* Read the index file that is potentially unmerged into given`
read-cache.c: typofix in comment Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-12-06 02:54:11 +01:00			`* index_state, dropping any unmerged entries. Returns true if`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00			`* the index is unmerged. Callers who want to refuse to work`
			`* from an unmerged state can call this and check its return value,`
			`* instead of calling read_cache().`
			`*/`
			`int read_index_unmerged(struct index_state *istate)`
			`{`
			`int i;`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`int unmerged = 0;`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00
			`read_index(istate);`
			`for (i = 0; i < istate->cache_nr; i++) {`
			`struct cache_entry *ce = istate->cache[i];`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`struct cache_entry *new_ce;`
			`int size, len;`

			`if (!ce_stage(ce))`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00			`continue;`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`unmerged = 1;`
Replace strlen() with ce_namelen() Replace strlen(ce->name) with ce_namelen() in a couple of places which gives us some additional bits of performance. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-06 18:07:30 +02:00			`len = ce_namelen(ce);`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`size = cache_entry_size(len);`
			`new_ce = xcalloc(1, size);`
			`memcpy(new_ce->name, ce->name, len);`
Strip namelen out of ce_flags into a ce_namelen field Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-07-11 11:22:37 +02:00			`new_ce->ce_flags = create_ce_flags(0) \| CE_CONFLICTED;`
			`new_ce->ce_namelen = len;`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`new_ce->ce_mode = ce->ce_mode;`
			`if (add_index_entry(istate, new_ce, 0))`
			`return error("%s: cannot drop to stage #0",`
			`ce->name);`
			`i = index_name_pos(istate, new_ce->name, len);`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00			`}`
reset --hard/read-tree --reset -u: remove unmerged new paths When aborting a failed merge that has brought in a new path using "git reset --hard" or "git read-tree --reset -u", we used to first forget about the new path (via read_cache_unmerged) and then matched the working tree to what is recorded in the index, thus ending up leaving the new path in the work tree. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 01:00:06 +02:00			`return unmerged;`
Move read_cache_unmerged() to read-cache.c builtin-read-tree has a read_cache_unmerged() which is useful for other builtins, for example builtin-merge uses it as well. Move it to read-cache.c to avoid code duplication. Signed-off-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-06-27 18:21:58 +02:00			`}`
builtin-add.c: restructure the code for maintainability A private function add_files_to_cache() in builtin-add.c was borrowed by checkout and commit re-implementors without getting properly refactored to more library-ish place. This does the refactoring. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-21 10:24:17 +02:00
refactor handling of "other" files in ls-files and status When the "git status" display code was originally converted to C, we copied the code from ls-files to discover whether a pathname returned by read_directory was an "other", or untracked, file. Much later, 5698454e updated the code in ls-files to handle some new cases caused by gitlinks. This left the code in wt-status.c broken: it would display submodule directories as untracked directories. Nobody noticed until now, however, because unless status.showUntrackedFiles was set to "all", submodule directories were not actually reported by read_directory. So the bug was only triggered in the presence of a submodule _and_ this config option. This patch pulls the ls-files code into a new function, cache_name_is_other, and uses it in both places. This should leave the ls-files functionality the same and fix the bug in status. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-16 17:07:26 +02:00			`/*`
			`* Returns 1 if the path is an "other" path with respect to`
			`* the index; that is, the path is not mentioned in the index at all,`
			`* either as a file, a directory with some files in the index,`
			`* or as an unmerged entry.`
			`*`
			`* We helpfully remove a trailing "/" from directories so that`
			`* the output of read_directory can be used as-is.`
			`*/`
			`int index_name_is_other(const struct index_state istate, const char name,`
			`int namelen)`
			`{`
			`int pos;`
			`if (namelen && name[namelen - 1] == '/')`
			`namelen--;`
			`pos = index_name_pos(istate, name, namelen);`
			`if (0 <= pos)`
			`return 0; /* exact match */`
			`pos = -pos - 1;`
			`if (pos < istate->cache_nr) {`
			`struct cache_entry *ce = istate->cache[pos];`
			`if (ce_namelen(ce) == namelen &&`
			`!memcmp(ce->name, name, namelen))`
			`return 0; /* Yup, this one exists unmerged */`
			`}`
			`return 1;`
			`}`
attr.c: extract read_index_data() as read_blob_data_from_index() Extract the read_index_data() function from attr.c and move it to read-cache.c; rename it to read_blob_data_from_index() and update the function signature of it to align better with index/cache API functions. This allows for reusing the function in convert.c later. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:30 +02:00
read_blob_data_from_index(): optionally return the size of blob data This allows for optionally getting the size of the returned data and will be used in a follow-up patch. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:31 +02:00			`void read_blob_data_from_index(struct index_state istate, const char path, unsigned long size)`
attr.c: extract read_index_data() as read_blob_data_from_index() Extract the read_index_data() function from attr.c and move it to read-cache.c; rename it to read_blob_data_from_index() and update the function signature of it to align better with index/cache API functions. This allows for reusing the function in convert.c later. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:30 +02:00			`{`
			`int pos, len;`
			`unsigned long sz;`
			`enum object_type type;`
			`void *data;`

			`len = strlen(path);`
			`pos = index_name_pos(istate, path, len);`
			`if (pos < 0) {`
			`/*`
			`* We might be in the middle of a merge, in which`
			`* case we would read stage #2 (ours).`
			`*/`
			`int i;`
			`for (i = -pos - 1;`
			`(pos < 0 && i < istate->cache_nr &&`
			`!strcmp(istate->cache[i]->name, path));`
			`i++)`
			`if (ce_stage(istate->cache[i]) == 2)`
			`pos = i;`
			`}`
			`if (pos < 0)`
			`return NULL;`
			`data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);`
			`if (!data \|\| type != OBJ_BLOB) {`
			`free(data);`
			`return NULL;`
			`}`
read_blob_data_from_index(): optionally return the size of blob data This allows for optionally getting the size of the returned data and will be used in a follow-up patch. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:31 +02:00			`if (size)`
			`*size = sz;`
attr.c: extract read_index_data() as read_blob_data_from_index() Extract the read_index_data() function from attr.c and move it to read-cache.c; rename it to read_blob_data_from_index() and update the function signature of it to align better with index/cache API functions. This allows for reusing the function in convert.c later. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:30 +02:00			`return data;`
			`}`
add a stat_validity struct It can sometimes be useful to know whether a path in the filesystem has been updated without going to the work of opening and re-reading its content. We trust the stat() information on disk already to handle index updates, and we can use the same trick here. This patch introduces a "stat_validity" struct which encapsulates the concept of checking the stat-freshness of a file. It is implemented on top of "struct stat_data" to reuse the logic about which stat entries to trust for a particular platform, but hides the complexity behind two simple functions: check and update. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-06-20 10:37:51 +02:00
			`void stat_validity_clear(struct stat_validity *sv)`
			`{`
			`free(sv->sd);`
			`sv->sd = NULL;`
			`}`

			`int stat_validity_check(struct stat_validity sv, const char path)`
			`{`
			`struct stat st;`

			`if (stat(path, &st) < 0)`
			`return sv->sd == NULL;`
			`if (!sv->sd)`
			`return 0;`
			`return S_ISREG(st.st_mode) && !match_stat_data(sv->sd, &st);`
			`}`

			`void stat_validity_update(struct stat_validity *sv, int fd)`
			`{`
			`struct stat st;`

			`if (fstat(fd, &st) < 0 \|\| !S_ISREG(st.st_mode))`
			`stat_validity_clear(sv);`
			`else {`
			`if (!sv->sd)`
			`sv->sd = xcalloc(1, sizeof(struct stat_data));`
			`fill_stat_data(sv->sd, &st);`
			`}`
			`}`