mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-13 20:53:02 +01:00

289 lines

7.6 KiB

C

Raw Normal View History

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`/*`
			`* name-hash.c`
			`*`
			`* Hashing names in the index state`
			`*`
			`* Copyright (C) 2008 Linus Torvalds`
			`*/`
			`#define NO_THE_INDEX_COMPATIBILITY_MACROS`
			`#include "cache.h"`

Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`/*`
			`* This removes bit 5 if bit 6 is set.`
			`*`
			`* That will make US-ASCII characters hash to their upper-case`
			`* equivalent. We could easily do this one whole word at a time,`
			`* but that's for future worries.`
			`*/`
			`static inline unsigned char icase_hash(unsigned char c)`
			`{`
			`return c & ~((c & 0x40) >> 1);`
			`}`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`static unsigned int hash_name(const char *name, int namelen)`
			`{`
			`unsigned int hash = 0x123;`

name-hash: allow hashing an empty string Usually we do not pass an empty string to the function hash_name() because we almost always ask for hash values for a path that is a candidate to be added to the index. However, check-ignore (and most likely check-attr, but I didn't check) apparently has a callchain to ask the hash value for an empty path when it was given a "." from the top-level directory to ask "Is the path . excluded by default?" Make sure that hash_name() does not overrun the end of the given pathname even when it is empty. Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely no, if only because there is currently no way to specify such an entry in the .gitignore file. But it is an unusual thing to ask and it is not worth optimizing for it by special casing at the top level of the call chain. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-19 20:56:44 +01:00			`while (namelen--) {`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`unsigned char c = *name++;`
Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`c = icase_hash(c);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`hash = hash*101 + c;`
name-hash: allow hashing an empty string Usually we do not pass an empty string to the function hash_name() because we almost always ask for hash values for a path that is a candidate to be added to the index. However, check-ignore (and most likely check-attr, but I didn't check) apparently has a callchain to ask the hash value for an empty path when it was given a "." from the top-level directory to ask "Is the path . excluded by default?" Make sure that hash_name() does not overrun the end of the given pathname even when it is empty. Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely no, if only because there is currently no way to specify such an entry in the .gitignore file. But it is an unusual thing to ask and it is not worth optimizing for it by special casing at the top level of the call chain. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-19 20:56:44 +01:00			`}`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`return hash;`
			`}`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry {`
			`struct dir_entry *next;`
			`struct dir_entry *parent;`
			`struct cache_entry *ce;`
			`int nr;`
			`unsigned int namelen;`
			`};`

			`static struct dir_entry find_dir_entry(struct index_state istate,`
			`const char *name, unsigned int namelen)`
			`{`
			`unsigned int hash = hash_name(name, namelen);`
			`struct dir_entry *dir;`

			`for (dir = lookup_hash(hash, &istate->dir_hash); dir; dir = dir->next)`
			`if (dir->namelen == namelen &&`
			`!strncasecmp(dir->ce->name, name, namelen))`
			`return dir;`
			`return NULL;`
			`}`

			`static struct dir_entry hash_dir_entry(struct index_state istate,`
			`struct cache_entry *ce, int namelen)`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`{`
			`/*`
			`* Throw each directory component in the hash for quick lookup`
			`* during a git status. Directory components are stored with their`
			`* closing slash. Despite submodules being a directory, they never`
			`* reach this point, because they are stored without a closing slash`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* in index_state.name_hash (as ordinary cache_entries).`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`*`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* Note that the cache_entry stored with the dir_entry merely`
			`* supplies the name of the directory (up to dir_entry.namelen). We`
			`* track the number of 'active' files in a directory in dir_entry.nr,`
			`* so we can tell if the directory is still relevant, e.g. for git`
			`* status. However, if cache_entries are removed, we cannot pinpoint`
			`* an exact cache_entry that's still active. It is very possible that`
			`* multiple dir_entries point to the same cache_entry.`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`*/`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry *dir;`

			`/* get length of parent directory */`
			`while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1]))`
			`namelen--;`
			`if (namelen <= 0)`
			`return NULL;`

			`/* lookup existing entry for that directory */`
			`dir = find_dir_entry(istate, ce->name, namelen);`
			`if (!dir) {`
			`/* not found, create it and add to hash table */`
			`void **pdir;`
			`unsigned int hash = hash_name(ce->name, namelen);`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`dir = xcalloc(1, sizeof(struct dir_entry));`
			`dir->namelen = namelen;`
			`dir->ce = ce;`

			`pdir = insert_hash(hash, dir, &istate->dir_hash);`
			`if (pdir) {`
			`dir->next = *pdir;`
			`*pdir = dir;`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
			`/* recursively add missing parent directories */`
			`dir->parent = hash_dir_entry(istate, ce, namelen - 1);`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`return dir;`
			`}`

			`static void add_dir_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`/* Add reference to the directory entry (and parents if 0). */`
			`struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));`
			`while (dir && !(dir->nr++))`
			`dir = dir->parent;`
			`}`

			`static void remove_dir_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`/*`
			`* Release reference to the directory entry (and parents if 0).`
			`*`
			`* Note: we do not remove / free the entry because there's no`
			`* hash.[ch]::remove_hash and dir->next may point to other entries`
			`* that are still valid, so we must not free the memory.`
			`*/`
			`struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));`
			`while (dir && dir->nr && !(--dir->nr))`
			`dir = dir->parent;`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`}`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`static void hash_index_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`void **pos;`
			`unsigned int hash;`

			`if (ce->ce_flags & CE_HASHED)`
			`return;`
			`ce->ce_flags \|= CE_HASHED;`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`ce->next = NULL;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`hash = hash_name(ce->name, ce_namelen(ce));`
			`pos = insert_hash(hash, ce, &istate->name_hash);`
			`if (pos) {`
			`ce->next = *pos;`
			`*pos = ce;`
			`}`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`if (ignore_case && !(ce->ce_flags & CE_UNHASHED))`
			`add_dir_entry(istate, ce);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`

			`static void lazy_init_name_hash(struct index_state *istate)`
			`{`
			`int nr;`

			`if (istate->name_hash_initialized)`
			`return;`
Preallocate hash tables when the number of inserts are known in advance This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-17 04:28:06 +01:00			`if (istate->cache_nr)`
			`preallocate_hash(&istate->name_hash, istate->cache_nr);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`for (nr = 0; nr < istate->cache_nr; nr++)`
			`hash_index_entry(istate, istate->cache[nr]);`
			`istate->name_hash_initialized = 1;`
			`}`

			`void add_name_hash(struct index_state istate, struct cache_entry ce)`
			`{`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`/* if already hashed, add reference to directory entries */`
			`if (ignore_case && (ce->ce_flags & CE_STATE_MASK) == CE_STATE_MASK)`
			`add_dir_entry(istate, ce);`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`ce->ce_flags &= ~CE_UNHASHED;`
			`if (istate->name_hash_initialized)`
			`hash_index_entry(istate, ce);`
			`}`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`/*`
			`* We don't actually remove it, we can just mark it invalid so that`
			`* we won't find it in lookups.`
			`*`
			`* Not only would we have to search the lists (simple enough), but`
			`* we'd also have to rehash other hash buckets in case this makes the`
			`* hash bucket empty (common). So it's much better to just mark`
			`* it.`
			`*/`
			`void remove_name_hash(struct index_state istate, struct cache_entry ce)`
			`{`
			`/* if already hashed, release reference to directory entries */`
			`if (ignore_case && (ce->ce_flags & CE_STATE_MASK) == CE_HASHED)`
			`remove_dir_entry(istate, ce);`

			`ce->ce_flags \|= CE_UNHASHED;`
			`}`

Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`static int slow_same_name(const char name1, int len1, const char name2, int len2)`
			`{`
			`if (len1 != len2)`
			`return 0;`

			`while (len1) {`
			`unsigned char c1 = *name1++;`
			`unsigned char c2 = *name2++;`
			`len1--;`
			`if (c1 != c2) {`
			`c1 = toupper(c1);`
			`c2 = toupper(c2);`
			`if (c1 != c2)`
			`return 0;`
			`}`
			`}`
			`return 1;`
			`}`

			`static int same_name(const struct cache_entry ce, const char name, int namelen, int icase)`
			`{`
			`int len = ce_namelen(ce);`

			`/*`
			`* Always do exact compare, even if we want a case-ignoring comparison;`
			`* we do the quick exact one first, because it will be the common case.`
			`*/`
			`if (len == namelen && !cache_name_compare(name, namelen, ce->name, len))`
			`return 1;`

Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`if (!icase)`
			`return 0;`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`return slow_same_name(name, namelen, ce->name, len);`
Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`}`

			`struct cache_entry index_name_exists(struct index_state istate, const char *name, int namelen, int icase)`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`{`
			`unsigned int hash = hash_name(name, namelen);`
			`struct cache_entry *ce;`

			`lazy_init_name_hash(istate);`
			`ce = lookup_hash(hash, &istate->name_hash);`

			`while (ce) {`
			`if (!(ce->ce_flags & CE_UNHASHED)) {`
Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`if (same_name(ce, name, namelen, icase))`
Make "index_name_exists()" return the cache_entry it found This allows verify_absent() in unpack_trees() to use the hash chains rather than looking it up using the binary search. Perhaps more importantly, it's also going to be useful for the next phase, where we actually start looking at the cache entry when we do case-insensitive lookups and checking the result. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:53:00 +01:00			`return ce;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`ce = ce->next;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00
			`/*`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* When looking for a directory (trailing '/'), it might be a`
			`* submodule or a directory. Despite submodules being directories,`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`* they are stored in the name hash without a closing slash.`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* When ignore_case is 1, directories are stored in a separate hash`
			`* table with their closing slash.`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`*`
			`* The side effect of this storage technique is we have need to`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* lookup the directory in a separate hash table, and if not found`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`* remove the slash from name and perform the lookup again without`
			`* the slash. If a match is made, S_ISGITLINK(ce->mode) will be`
			`* true.`
			`*/`
			`if (icase && name[namelen - 1] == '/') {`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry *dir = find_dir_entry(istate, name, namelen);`
			`if (dir && dir->nr)`
			`return dir->ce;`

Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`ce = index_name_exists(istate, name, namelen - 1, icase);`
			`if (ce && S_ISGITLINK(ce->ce_mode))`
			`return ce;`
			`}`
Make "index_name_exists()" return the cache_entry it found This allows verify_absent() in unpack_trees() to use the hash chains rather than looking it up using the binary search. Perhaps more importantly, it's also going to be useful for the next phase, where we actually start looking at the cache entry when we do case-insensitive lookups and checking the result. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:53:00 +01:00			`return NULL;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
			`static int free_dir_entry(void entry, void unused)`
			`{`
			`struct dir_entry *dir = entry;`
			`while (dir) {`
			`struct dir_entry *next = dir->next;`
			`free(dir);`
			`dir = next;`
			`}`
			`return 0;`
			`}`

			`void free_name_hash(struct index_state *istate)`
			`{`
			`if (!istate->name_hash_initialized)`
			`return;`
			`istate->name_hash_initialized = 0;`
			`if (ignore_case)`
			`/* free directory entries */`
			`for_each_hash(&istate->dir_hash, free_dir_entry, NULL);`

			`free_hash(&istate->name_hash);`
			`free_hash(&istate->dir_hash);`
			`}`