mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-31 22:37:54 +01:00

728 lines

18 KiB

C

Raw Normal View History

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`/*`
			`* name-hash.c`
			`*`
			`* Hashing names in the index state`
			`*`
			`* Copyright (C) 2008 Linus Torvalds`
			`*/`
			`#define NO_THE_INDEX_COMPATIBILITY_MACROS`
			`#include "cache.h"`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry {`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`struct hashmap_entry ent;`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry *parent;`
			`int nr;`
			`unsigned int namelen;`
name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`char name[FLEX_ARRAY];`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`};`

hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-06-30 21:14:05 +02:00			`static int dir_entry_cmp(const void *unused_cmp_data,`
name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-07-01 02:28:37 +02:00			`const void *entry,`
			`const void *entry_or_key,`
			`const void *keydata)`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`{`
name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-07-01 02:28:37 +02:00			`const struct dir_entry *e1 = entry;`
			`const struct dir_entry *e2 = entry_or_key;`
			`const char *name = keydata;`

name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`return e1->namelen != e2->namelen \|\| strncasecmp(e1->name,`
			`name ? name : e2->name, e1->namelen);`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`}`

name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`static struct dir_entry find_dir_entry__hash(struct index_state istate,`
			`const char *name, unsigned int namelen, unsigned int hash)`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`{`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`struct dir_entry key;`
name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`hashmap_entry_init(&key, hash);`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`key.namelen = namelen;`
			`return hashmap_get(&istate->dir_hash, &key, name);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`}`

name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`static struct dir_entry find_dir_entry(struct index_state istate,`
			`const char *name, unsigned int namelen)`
			`{`
			`return find_dir_entry__hash(istate, name, namelen, memihash(name, namelen));`
			`}`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`static struct dir_entry hash_dir_entry(struct index_state istate,`
			`struct cache_entry *ce, int namelen)`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`{`
			`/*`
			`* Throw each directory component in the hash for quick lookup`
name-hash: stop storing trailing '/' on paths in index_state.dir_hash When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:16 +02:00			`* during a git status. Directory components are stored without their`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`* closing slash. Despite submodules being a directory, they never`
name-hash: stop storing trailing '/' on paths in index_state.dir_hash When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:16 +02:00			`* reach this point, because they are stored`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`* in index_state.name_hash (as ordinary cache_entries).`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`*/`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`struct dir_entry *dir;`

			`/* get length of parent directory */`
			`while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1]))`
			`namelen--;`
			`if (namelen <= 0)`
			`return NULL;`
name-hash: stop storing trailing '/' on paths in index_state.dir_hash When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:16 +02:00			`namelen--;`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
			`/* lookup existing entry for that directory */`
			`dir = find_dir_entry(istate, ce->name, namelen);`
			`if (!dir) {`
			`/* not found, create it and add to hash table */`
convert trivial cases to FLEX_ARRAY macros Using FLEX_ARRAY macros reduces the amount of manual computation size we have to do. It also ensures we don't overflow size_t, and it makes sure we write the same number of bytes that we allocated. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-22 23:44:32 +01:00			`FLEX_ALLOC_MEM(dir, name, ce->name, namelen);`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`hashmap_entry_init(dir, memihash(ce->name, namelen));`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`dir->namelen = namelen;`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`hashmap_add(&istate->dir_hash, dir);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
			`/* recursively add missing parent directories */`
name-hash: stop storing trailing '/' on paths in index_state.dir_hash When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the (now) redundant trailing '/', thus callers continue to bear the burden of ensuring the slash's presence before searching the index for a directory. Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:16 +02:00			`dir->parent = hash_dir_entry(istate, ce, namelen);`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`return dir;`
			`}`

			`static void add_dir_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`/* Add reference to the directory entry (and parents if 0). */`
			`struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));`
			`while (dir && !(dir->nr++))`
			`dir = dir->parent;`
			`}`

			`static void remove_dir_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`/*`
name-hash.c: remove unreferenced directory entries The new hashmap implementation supports remove, so remove and free directory entries that are no longer referenced by active cache entries. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:21:26 +01:00			`* Release reference to the directory entry. If 0, remove and continue`
			`* with parent directory.`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`*/`
			`struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));`
name-hash.c: remove unreferenced directory entries The new hashmap implementation supports remove, so remove and free directory entries that are no longer referenced by active cache entries. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:21:26 +01:00			`while (dir && !(--dir->nr)) {`
			`struct dir_entry *parent = dir->parent;`
			`hashmap_remove(&istate->dir_hash, dir, NULL);`
			`free(dir);`
			`dir = parent;`
			`}`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`}`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`static void hash_index_entry(struct index_state istate, struct cache_entry ce)`
			`{`
			`if (ce->ce_flags & CE_HASHED)`
			`return;`
			`ce->ce_flags \|= CE_HASHED;`
name-hash.c: use new hash map implementation for cache entries Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be removed, as ce->next (now ce->ent.next) is always properly initialized in name-hash.c::hash_index_entry. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:21:58 +01:00			`hashmap_entry_init(ce, memihash(ce->name, ce_namelen(ce)));`
			`hashmap_add(&istate->name_hash, ce);`
Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`if (ignore_case)`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`add_dir_entry(istate, ce);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`

hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-06-30 21:14:05 +02:00			`static int cache_entry_cmp(const void *unused_cmp_data,`
name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-07-01 02:28:37 +02:00			`const void *entry,`
			`const void *entry_or_key,`
hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-06-30 21:14:05 +02:00			`const void *remove)`
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`{`
name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-07-01 02:28:37 +02:00			`const struct cache_entry *ce1 = entry;`
			`const struct cache_entry *ce2 = entry_or_key;`
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`/*`
			`* For remove_name_hash, find the exact entry (pointer equality); for`
name-hash: retire unused index_name_exists() db5360f3f496 (name-hash: refactor polymorphic index_name_exists(); 2013-09-17) split index_name_exists() into index_file_exists() and index_dir_exists() but retained index_name_exists() as a thin wrapper to avoid disturbing possible in-flight topics. Since this change landed in 'master' some time ago and there are no in-flight topics referencing index_name_exists(), retire it. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-01-02 22:57:12 +01:00			`* index_file_exists, find all entries with matching hash code and`
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`* decide whether the entry matches in same_name.`
			`*/`
			`return remove ? !(ce1 == ce2) : 0;`
			`}`

name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`static int lazy_try_threaded = 1;`
			`static int lazy_nr_dir_threads;`

			`#ifdef NO_PTHREADS`

			`static inline int lookup_lazy_params(struct index_state *istate)`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`{`
name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`return 0;`
			`}`

			`static inline void threaded_lazy_init_name_hash(`
			`struct index_state *istate)`
			`{`
			`}`

			`#else`

			`#include "thread-utils.h"`

			`/*`
			`* Set a minimum number of cache_entries that we will handle per`
			`* thread and use that to decide how many threads to run (upto`
			`* the number on the system).`
			`*`
			`* For guidance setting the lower per-thread bound, see:`
			`* t/helper/test-lazy-init-name-hash --analyze`
			`*/`
			`#define LAZY_THREAD_COST (2000)`

			`/*`
			`* We use n mutexes to guard n partitions of the "istate->dir_hash"`
			`* hashtable. Since "find" and "insert" operations will hash to a`
			`* particular bucket and modify/search a single chain, we can say`
			`* that "all chains mod n" are guarded by the same mutex -- rather`
			`* than having a single mutex to guard the entire table. (This does`
			`* require that we disable "rehashing" on the hashtable.)`
			`*`
			`* So, a larger value here decreases the probability of a collision`
			`* and the time that each thread must wait for the mutex.`
			`*/`
			`#define LAZY_MAX_MUTEX (32)`

			`static pthread_mutex_t *lazy_dir_mutex_array;`

			`/*`
			`* An array of lazy_entry items is used by the n threads in`
			`* the directory parse (first) phase to (lock-free) store the`
			`* intermediate results. These values are then referenced by`
			`* the 2 threads in the second phase.`
			`*/`
			`struct lazy_entry {`
			`struct dir_entry *dir;`
			`unsigned int hash_dir;`
			`unsigned int hash_name;`
			`};`

			`/*`
			`* Decide if we want to use threads (if available) to load`
			`* the hash tables. We set "lazy_nr_dir_threads" to zero when`
			`* it is not worth it.`
			`*/`
			`static int lookup_lazy_params(struct index_state *istate)`
			`{`
			`int nr_cpus;`

			`lazy_nr_dir_threads = 0;`

			`if (!lazy_try_threaded)`
			`return 0;`

			`/*`
			`* If we are respecting case, just use the original`
			`* code to build the "istate->name_hash". We don't`
			`* need the complexity here.`
			`*/`
			`if (!ignore_case)`
			`return 0;`

			`nr_cpus = online_cpus();`
			`if (nr_cpus < 2)`
			`return 0;`

			`if (istate->cache_nr < 2 * LAZY_THREAD_COST)`
			`return 0;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00
name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`if (istate->cache_nr < nr_cpus * LAZY_THREAD_COST)`
			`nr_cpus = istate->cache_nr / LAZY_THREAD_COST;`
			`lazy_nr_dir_threads = nr_cpus;`
			`return lazy_nr_dir_threads;`
			`}`

			`/*`
			`* Initialize n mutexes for use when searching and inserting`
			`* into "istate->dir_hash". All "dir" threads are trying`
			`* to insert partial pathnames into the hash as they iterate`
			`* over their portions of the index, so lock contention is`
			`* high.`
			`*`
			`* However, the hashmap is going to put items into bucket`
			`* chains based on their hash values. Use that to create n`
			`* mutexes and lock on mutex[bucket(hash) % n]. This will`
			`* decrease the collision rate by (hopefully) by a factor of n.`
			`*/`
			`static void init_dir_mutex(void)`
			`{`
			`int j;`

			`lazy_dir_mutex_array = xcalloc(LAZY_MAX_MUTEX, sizeof(pthread_mutex_t));`

			`for (j = 0; j < LAZY_MAX_MUTEX; j++)`
			`init_recursive_mutex(&lazy_dir_mutex_array[j]);`
			`}`

			`static void cleanup_dir_mutex(void)`
			`{`
			`int j;`

			`for (j = 0; j < LAZY_MAX_MUTEX; j++)`
			`pthread_mutex_destroy(&lazy_dir_mutex_array[j]);`

			`free(lazy_dir_mutex_array);`
			`}`

			`static void lock_dir_mutex(int j)`
			`{`
			`pthread_mutex_lock(&lazy_dir_mutex_array[j]);`
			`}`

			`static void unlock_dir_mutex(int j)`
			`{`
			`pthread_mutex_unlock(&lazy_dir_mutex_array[j]);`
			`}`

			`static inline int compute_dir_lock_nr(`
			`const struct hashmap *map,`
			`unsigned int hash)`
			`{`
			`return hashmap_bucket(map, hash) % LAZY_MAX_MUTEX;`
			`}`

			`static struct dir_entry *hash_dir_entry_with_parent_and_prefix(`
			`struct index_state *istate,`
			`struct dir_entry *parent,`
			`struct strbuf *prefix)`
			`{`
			`struct dir_entry *dir;`
			`unsigned int hash;`
			`int lock_nr;`

			`/*`
			`* Either we have a parent directory and path with slash(es)`
			`* or the directory is an immediate child of the root directory.`
			`*/`
			`assert((parent != NULL) ^ (strchr(prefix->buf, '/') == NULL));`

			`if (parent)`
			`hash = memihash_cont(parent->ent.hash,`
			`prefix->buf + parent->namelen,`
			`prefix->len - parent->namelen);`
			`else`
			`hash = memihash(prefix->buf, prefix->len);`

			`lock_nr = compute_dir_lock_nr(&istate->dir_hash, hash);`
			`lock_dir_mutex(lock_nr);`

			`dir = find_dir_entry__hash(istate, prefix->buf, prefix->len, hash);`
			`if (!dir) {`
			`FLEX_ALLOC_MEM(dir, name, prefix->buf, prefix->len);`
			`hashmap_entry_init(dir, hash);`
			`dir->namelen = prefix->len;`
			`dir->parent = parent;`
			`hashmap_add(&istate->dir_hash, dir);`

			`if (parent) {`
			`unlock_dir_mutex(lock_nr);`

			`/* All I really need here is an InterlockedIncrement(&(parent->nr)) */`
			`lock_nr = compute_dir_lock_nr(&istate->dir_hash, parent->ent.hash);`
			`lock_dir_mutex(lock_nr);`
			`parent->nr++;`
			`}`
			`}`

			`unlock_dir_mutex(lock_nr);`

			`return dir;`
			`}`

			`/*`
			`* handle_range_1() and handle_range_dir() are derived from`
			`* clear_ce_flags_1() and clear_ce_flags_dir() in unpack-trees.c`
			`* and handle the iteration over the entire array of index entries.`
			`* They use recursion for adjacent entries in the same parent`
			`* directory.`
			`*/`
			`static int handle_range_1(`
			`struct index_state *istate,`
			`int k_start,`
			`int k_end,`
			`struct dir_entry *parent,`
			`struct strbuf *prefix,`
			`struct lazy_entry *lazy_entries);`

			`static int handle_range_dir(`
			`struct index_state *istate,`
			`int k_start,`
			`int k_end,`
			`struct dir_entry *parent,`
			`struct strbuf *prefix,`
			`struct lazy_entry *lazy_entries,`
			`struct dir_entry **dir_new_out)`
			`{`
			`int rc, k;`
			`int input_prefix_len = prefix->len;`
			`struct dir_entry *dir_new;`

			`dir_new = hash_dir_entry_with_parent_and_prefix(istate, parent, prefix);`

			`strbuf_addch(prefix, '/');`

			`/*`
			`* Scan forward in the index array for index entries having the same`
			`* path prefix (that are also in this directory).`
			`*/`
name-hash: fix buffer overrun Add check for the end of the entries for the thread partition. Add test for lazy init name hash with specific directory structure The lazy init hash name was causing a buffer overflow when the last entry in the index was multiple folder deep with parent folders that did not have any files in them. This adds a test for the boundary condition of the thread partitions with the folder structure that was triggering the buffer overflow. The fix was to check if it is the last entry for the thread partition in the handle_range_dir and not try to use the next entry in the cache. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 19:32:14 +02:00			`if (k_start + 1 >= k_end)`
			`k = k_end;`
			`else if (strncmp(istate->cache[k_start + 1]->name, prefix->buf, prefix->len) > 0)`
name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`k = k_start + 1;`
			`else if (strncmp(istate->cache[k_end - 1]->name, prefix->buf, prefix->len) == 0)`
			`k = k_end;`
			`else {`
			`int begin = k_start;`
			`int end = k_end;`
			`while (begin < end) {`
			`int mid = (begin + end) >> 1;`
			`int cmp = strncmp(istate->cache[mid]->name, prefix->buf, prefix->len);`
			`if (cmp == 0) /* mid has same prefix; look in second part */`
			`begin = mid + 1;`
			`else if (cmp > 0) /* mid is past group; look in first part */`
			`end = mid;`
			`else`
			`die("cache entry out of order");`
			`}`
			`k = begin;`
			`}`

			`/*`
			`* Recurse and process what we can of this subset [k_start, k).`
			`*/`
			`rc = handle_range_1(istate, k_start, k, dir_new, prefix, lazy_entries);`

			`strbuf_setlen(prefix, input_prefix_len);`

			`*dir_new_out = dir_new;`
			`return rc;`
			`}`

			`static int handle_range_1(`
			`struct index_state *istate,`
			`int k_start,`
			`int k_end,`
			`struct dir_entry *parent,`
			`struct strbuf *prefix,`
			`struct lazy_entry *lazy_entries)`
			`{`
			`int input_prefix_len = prefix->len;`
			`int k = k_start;`

			`while (k < k_end) {`
			`struct cache_entry *ce_k = istate->cache[k];`
			`const char name, slash;`

			`if (prefix->len && strncmp(ce_k->name, prefix->buf, prefix->len))`
			`break;`

			`name = ce_k->name + prefix->len;`
			`slash = strchr(name, '/');`

			`if (slash) {`
			`int len = slash - name;`
			`int processed;`
			`struct dir_entry *dir_new;`

			`strbuf_add(prefix, name, len);`
			`processed = handle_range_dir(istate, k, k_end, parent, prefix, lazy_entries, &dir_new);`
			`if (processed) {`
			`k += processed;`
			`strbuf_setlen(prefix, input_prefix_len);`
			`continue;`
			`}`

			`strbuf_addch(prefix, '/');`
			`processed = handle_range_1(istate, k, k_end, dir_new, prefix, lazy_entries);`
			`k += processed;`
			`strbuf_setlen(prefix, input_prefix_len);`
			`continue;`
			`}`

			`/*`
			`* It is too expensive to take a lock to insert "ce_k"`
			`* into "istate->name_hash" and increment the ref-count`
			`* on the "parent" dir. So we defer actually updating`
			`* permanent data structures until phase 2 (where we`
			`* can change the locking requirements) and simply`
			`* accumulate our current results into the lazy_entries`
			`* data array).`
			`*`
			`* We do not need to lock the lazy_entries array because`
			`* we have exclusive access to the cells in the range`
			`* [k_start,k_end) that this thread was given.`
			`*/`
			`lazy_entries[k].dir = parent;`
			`if (parent) {`
			`lazy_entries[k].hash_name = memihash_cont(`
			`parent->ent.hash,`
			`ce_k->name + parent->namelen,`
			`ce_namelen(ce_k) - parent->namelen);`
			`lazy_entries[k].hash_dir = parent->ent.hash;`
			`} else {`
			`lazy_entries[k].hash_name = memihash(ce_k->name, ce_namelen(ce_k));`
			`}`

			`k++;`
			`}`

			`return k - k_start;`
			`}`

			`struct lazy_dir_thread_data {`
			`pthread_t pthread;`
			`struct index_state *istate;`
			`struct lazy_entry *lazy_entries;`
			`int k_start;`
			`int k_end;`
			`};`

			`static void lazy_dir_thread_proc(void _data)`
			`{`
			`struct lazy_dir_thread_data *d = _data;`
			`struct strbuf prefix = STRBUF_INIT;`
			`handle_range_1(d->istate, d->k_start, d->k_end, NULL, &prefix, d->lazy_entries);`
			`strbuf_release(&prefix);`
			`return NULL;`
			`}`

			`struct lazy_name_thread_data {`
			`pthread_t pthread;`
			`struct index_state *istate;`
			`struct lazy_entry *lazy_entries;`
			`};`

			`static void lazy_name_thread_proc(void _data)`
			`{`
			`struct lazy_name_thread_data *d = _data;`
			`int k;`

			`for (k = 0; k < d->istate->cache_nr; k++) {`
			`struct cache_entry *ce_k = d->istate->cache[k];`
			`ce_k->ce_flags \|= CE_HASHED;`
			`hashmap_entry_init(ce_k, d->lazy_entries[k].hash_name);`
			`hashmap_add(&d->istate->name_hash, ce_k);`
			`}`

			`return NULL;`
			`}`

			`static inline void lazy_update_dir_ref_counts(`
			`struct index_state *istate,`
			`struct lazy_entry *lazy_entries)`
			`{`
			`int k;`

			`for (k = 0; k < istate->cache_nr; k++) {`
			`if (lazy_entries[k].dir)`
			`lazy_entries[k].dir->nr++;`
			`}`
			`}`

			`static void threaded_lazy_init_name_hash(`
			`struct index_state *istate)`
			`{`
			`int nr_each;`
			`int k_start;`
			`int t;`
			`struct lazy_entry *lazy_entries;`
			`struct lazy_dir_thread_data *td_dir;`
			`struct lazy_name_thread_data *td_name;`

			`k_start = 0;`
			`nr_each = DIV_ROUND_UP(istate->cache_nr, lazy_nr_dir_threads);`

			`lazy_entries = xcalloc(istate->cache_nr, sizeof(struct lazy_entry));`
			`td_dir = xcalloc(lazy_nr_dir_threads, sizeof(struct lazy_dir_thread_data));`
			`td_name = xcalloc(1, sizeof(struct lazy_name_thread_data));`

			`init_dir_mutex();`

			`/*`
			`* Phase 1:`
			`* Build "istate->dir_hash" using n "dir" threads (and a read-only index).`
			`*/`
			`for (t = 0; t < lazy_nr_dir_threads; t++) {`
			`struct lazy_dir_thread_data *td_dir_t = td_dir + t;`
			`td_dir_t->istate = istate;`
			`td_dir_t->lazy_entries = lazy_entries;`
			`td_dir_t->k_start = k_start;`
			`k_start += nr_each;`
			`if (k_start > istate->cache_nr)`
			`k_start = istate->cache_nr;`
			`td_dir_t->k_end = k_start;`
			`if (pthread_create(&td_dir_t->pthread, NULL, lazy_dir_thread_proc, td_dir_t))`
			`die("unable to create lazy_dir_thread");`
			`}`
			`for (t = 0; t < lazy_nr_dir_threads; t++) {`
			`struct lazy_dir_thread_data *td_dir_t = td_dir + t;`
			`if (pthread_join(td_dir_t->pthread, NULL))`
			`die("unable to join lazy_dir_thread");`
			`}`

			`/*`
			`* Phase 2:`
			`* Iterate over all index entries and add them to the "istate->name_hash"`
			`* using a single "name" background thread.`
			`* (Testing showed it wasn't worth running more than 1 thread for this.)`
			`*`
			`* Meanwhile, finish updating the parent directory ref-counts for each`
			`* index entry using the current thread. (This step is very fast and`
			`* doesn't need threading.)`
			`*/`
			`td_name->istate = istate;`
			`td_name->lazy_entries = lazy_entries;`
			`if (pthread_create(&td_name->pthread, NULL, lazy_name_thread_proc, td_name))`
			`die("unable to create lazy_name_thread");`

			`lazy_update_dir_ref_counts(istate, lazy_entries);`

			`if (pthread_join(td_name->pthread, NULL))`
			`die("unable to join lazy_name_thread");`

			`cleanup_dir_mutex();`

			`free(td_name);`
			`free(td_dir);`
			`free(lazy_entries);`
			`}`

			`#endif`

			`static void lazy_init_name_hash(struct index_state *istate)`
			`{`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`if (istate->name_hash_initialized)`
			`return;`
name-hash.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-07-01 02:28:37 +02:00			`hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);`
			`hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);`
name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00
			`if (lookup_lazy_params(istate)) {`
			`hashmap_disallow_rehash(&istate->dir_hash, 1);`
			`threaded_lazy_init_name_hash(istate);`
			`hashmap_disallow_rehash(&istate->dir_hash, 0);`
			`} else {`
			`int nr;`
			`for (nr = 0; nr < istate->cache_nr; nr++)`
			`hash_index_entry(istate, istate->cache[nr]);`
			`}`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`istate->name_hash_initialized = 1;`
			`}`

name-hash: perf improvement for lazy_init_name_hash Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-23 14:47:03 +01:00			`/*`
			`* A test routine for t/helper/ sources.`
			`*`
			`* Returns the number of threads used or 0 when`
			`* the non-threaded code path was used.`
			`*`
			`* Requesting threading WILL NOT override guards`
			`* in lookup_lazy_params().`
			`*/`
			`int test_lazy_init_name_hash(struct index_state *istate, int try_threaded)`
			`{`
			`lazy_nr_dir_threads = 0;`
			`lazy_try_threaded = try_threaded;`

			`lazy_init_name_hash(istate);`

			`return lazy_nr_dir_threads;`
			`}`

Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`void add_name_hash(struct index_state istate, struct cache_entry ce)`
			`{`
			`if (istate->name_hash_initialized)`
			`hash_index_entry(istate, ce);`
			`}`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`void remove_name_hash(struct index_state istate, struct cache_entry ce)`
			`{`
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`if (!istate->name_hash_initialized \|\| !(ce->ce_flags & CE_HASHED))`
			`return;`
			`ce->ce_flags &= ~CE_HASHED;`
			`hashmap_remove(&istate->name_hash, ce, ce);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`if (ignore_case)`
			`remove_dir_entry(istate, ce);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`}`

Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`static int slow_same_name(const char name1, int len1, const char name2, int len2)`
			`{`
			`if (len1 != len2)`
			`return 0;`

			`while (len1) {`
			`unsigned char c1 = *name1++;`
			`unsigned char c2 = *name2++;`
			`len1--;`
			`if (c1 != c2) {`
			`c1 = toupper(c1);`
			`c2 = toupper(c2);`
			`if (c1 != c2)`
			`return 0;`
			`}`
			`}`
			`return 1;`
			`}`

			`static int same_name(const struct cache_entry ce, const char name, int namelen, int icase)`
			`{`
			`int len = ce_namelen(ce);`

			`/*`
			`* Always do exact compare, even if we want a case-ignoring comparison;`
			`* we do the quick exact one first, because it will be the common case.`
			`*/`
name-hash.c: replace cache_name_compare() with memcmp(3) The same_name() private function wants a quick-and-exact check to see if they two names are byte-for-byte identical first and then fall back to the slow path. Use memcmp(3) for the former to make it clear that we do not want any "name" specific comparison. Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-20 04:06:43 +02:00			`if (len == namelen && !memcmp(name, ce->name, len))`
Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`return 1;`

Add case insensitivity support for directories when using git status When using a case preserving but case insensitive file system, directory case can differ but still refer to the same physical directory. git status reports the directory with the alternate case as an Untracked file. (That is, when mydir/filea.txt is added to the repository and then the directory on disk is renamed from mydir/ to MyDir/, git status shows MyDir/ as being untracked.) Support has been added in name-hash.c for hashing directories with a terminating slash into the name hash. When index_name_exists() is called with a directory (a name with a terminating slash), the name is not found via the normal cache_name_compare() call, but it is found in the slow_same_name() function. Additionally, in dir.c, directory_exists_in_index_icase() allows newly added directories deeper in the directory chain to be identified. Ultimately, it would be better if the file list was read in case insensitive alphabetical order from disk, but this change seems to suffice for now. The end result is the directory is looked up in a case insensitive manner and does not show in the Untracked files list. Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-03 11:56:43 +02:00			`if (!icase)`
			`return 0;`

name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`return slow_same_name(name, namelen, ce->name, len);`
Make hash_name_lookup able to do case-independent lookups Right now nobody uses it, but "index_name_exists()" gets a flag so you can enable it on a case-by-case basis. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:55:19 +01:00			`}`

name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`int index_dir_exists(struct index_state istate, const char name, int namelen)`
name-hash: refactor polymorphic index_name_exists() Depending upon the absence or presence of a trailing '/' on the incoming pathname, index_name_exists() checks either if a file is present in the index or if a directory is represented within the index. Each caller explicitly chooses the mode of operation by adding or removing a trailing '/' before invoking index_name_exists(). Since these two modes of operations are disjoint and have no code in common (one searches index_state.name_hash; the other dir_hash), they can be represented more naturally as distinct functions: one to search for a file, and one for a directory. Splitting index searching into two functions relieves callers of the artificial burden of having to add or remove a slash to select the mode of operation; instead they just call the desired function. A subsequent patch will take advantage of this benefit in order to eliminate the requirement that the incoming pathname for a directory search must have a trailing slash. (In order to avoid disturbing in-flight topics, index_name_exists() is retained as a thin wrapper dispatching either to index_dir_exists() or index_file_exists().) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:14 +02:00			`{`
			`struct dir_entry *dir;`

			`lazy_init_name_hash(istate);`
			`dir = find_dir_entry(istate, name, namelen);`
name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`return dir && dir->nr;`
			`}`
name-hash: refactor polymorphic index_name_exists() Depending upon the absence or presence of a trailing '/' on the incoming pathname, index_name_exists() checks either if a file is present in the index or if a directory is represented within the index. Each caller explicitly chooses the mode of operation by adding or removing a trailing '/' before invoking index_name_exists(). Since these two modes of operations are disjoint and have no code in common (one searches index_state.name_hash; the other dir_hash), they can be represented more naturally as distinct functions: one to search for a file, and one for a directory. Splitting index searching into two functions relieves callers of the artificial burden of having to add or remove a slash to select the mode of operation; instead they just call the desired function. A subsequent patch will take advantage of this benefit in order to eliminate the requirement that the incoming pathname for a directory search must have a trailing slash. (In order to avoid disturbing in-flight topics, index_name_exists() is retained as a thin wrapper dispatching either to index_dir_exists() or index_file_exists().) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:14 +02:00
name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`void adjust_dirname_case(struct index_state istate, char name)`
			`{`
			`const char *startPtr = name;`
			`const char *ptr = startPtr;`
name-hash: refactor polymorphic index_name_exists() Depending upon the absence or presence of a trailing '/' on the incoming pathname, index_name_exists() checks either if a file is present in the index or if a directory is represented within the index. Each caller explicitly chooses the mode of operation by adding or removing a trailing '/' before invoking index_name_exists(). Since these two modes of operations are disjoint and have no code in common (one searches index_state.name_hash; the other dir_hash), they can be represented more naturally as distinct functions: one to search for a file, and one for a directory. Splitting index searching into two functions relieves callers of the artificial burden of having to add or remove a slash to select the mode of operation; instead they just call the desired function. A subsequent patch will take advantage of this benefit in order to eliminate the requirement that the incoming pathname for a directory search must have a trailing slash. (In order to avoid disturbing in-flight topics, index_name_exists() is retained as a thin wrapper dispatching either to index_dir_exists() or index_file_exists().) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:14 +02:00
name-hash: don't reuse cache_entry in dir_entry Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-10-21 19:54:11 +02:00			`lazy_init_name_hash(istate);`
			`while (*ptr) {`
			`while (ptr && ptr != '/')`
			`ptr++;`

			`if (*ptr == '/') {`
			`struct dir_entry *dir;`

			`ptr++;`
			`dir = find_dir_entry(istate, name, ptr - name + 1);`
			`if (dir) {`
			`memcpy((void *)startPtr, dir->name + (startPtr - name), ptr - startPtr);`
			`startPtr = ptr;`
			`}`
			`}`
			`}`
name-hash: refactor polymorphic index_name_exists() Depending upon the absence or presence of a trailing '/' on the incoming pathname, index_name_exists() checks either if a file is present in the index or if a directory is represented within the index. Each caller explicitly chooses the mode of operation by adding or removing a trailing '/' before invoking index_name_exists(). Since these two modes of operations are disjoint and have no code in common (one searches index_state.name_hash; the other dir_hash), they can be represented more naturally as distinct functions: one to search for a file, and one for a directory. Splitting index searching into two functions relieves callers of the artificial burden of having to add or remove a slash to select the mode of operation; instead they just call the desired function. A subsequent patch will take advantage of this benefit in order to eliminate the requirement that the incoming pathname for a directory search must have a trailing slash. (In order to avoid disturbing in-flight topics, index_name_exists() is retained as a thin wrapper dispatching either to index_dir_exists() or index_file_exists().) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-09-17 09:06:14 +02:00			`}`

			`struct cache_entry index_file_exists(struct index_state istate, const char *name, int namelen, int icase)`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`{`
			`struct cache_entry *ce;`

			`lazy_init_name_hash(istate);`

hashmap: add simplified hashmap_get_from_hash() API Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-07-03 00:22:11 +02:00			`ce = hashmap_get_from_hash(&istate->name_hash,`
			`memihash(name, namelen), NULL);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`while (ce) {`
name-hash.c: remove cache entries instead of marking them CE_UNHASHED The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:22:27 +01:00			`if (same_name(ce, name, namelen, icase))`
			`return ce;`
name-hash.c: use new hash map implementation for cache entries Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be removed, as ce->next (now ce->ent.next) is always properly initialized in name-hash.c::hash_index_entry. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:21:58 +01:00			`ce = hashmap_get_next(&istate->name_hash, ce);`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`
Make "index_name_exists()" return the cache_entry it found This allows verify_absent() in unpack_trees() to use the hash chains rather than looking it up using the binary search. Perhaps more importantly, it's also going to be useful for the next phase, where we actually start looking at the cache entry when we do case-insensitive lookups and checking the result. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 23:53:00 +01:00			`return NULL;`
Move name hashing functions into a file of its own It's really totally separate functionality, and if we want to start doing case-insensitive hash lookups, I'd rather do it when it's separated out. It also renames "remove_index_entry()" to "remove_name_hash()", because that really describes the thing better. It doesn't actually remove the index entry, that's done by "remove_index_entry_at()", which is something very different, despite the similarity in names. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-21 21:16:24 +01:00			`}`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00
			`void free_name_hash(struct index_state *istate)`
			`{`
			`if (!istate->name_hash_initialized)`
			`return;`
			`istate->name_hash_initialized = 0;`

name-hash.c: use new hash map implementation for cache entries Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be removed, as ce->next (now ce->ent.next) is always properly initialized in name-hash.c::hash_index_entry. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:21:58 +01:00			`hashmap_free(&istate->name_hash, 0);`
name-hash.c: use new hash map implementation for directories Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-11-14 20:20:58 +01:00			`hashmap_free(&istate->dir_hash, 1);`
name-hash.c: fix endless loop with core.ignorecase=true With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-28 00:57:48 +01:00			`}`