Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
We started using wildmatch() in place of fnmatch(3); complete the
process and stop using fnmatch(3).
* nd/no-more-fnmatch:
actually remove compat fnmatch source code
stop using fnmatch (either native or compat)
Revert "test-wildmatch: add "perf" command to compare wildmatch and fnmatch"
use wildmatch() directly without fnmatch() wrapper
Trailing whitespaces in .gitignore files, unless they are quoted for
fnmatch(3), e.g. "path\ ", are warned and ignored.
Strictly speaking, this is a backward incompatible change, but very
unlikely to bite any sane user and adjusting should be obvious and
easy.
* nd/gitignore-trailing-whitespace:
t0008: skip trailing space test on Windows
dir: ignore trailing spaces in exclude patterns
dir: warn about trailing spaces in exclude patterns
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we do support matching pathspec "foo/" against directory
"foo". That is because match_pathspec() has no way to tell "foo" is a
directory and matching "foo/" against _file_ "foo" is wrong.
The callers can now tell match_pathspec if "foo" is a directory, we
could make an exception for this case. Code is not executed though
because no callers pass the flag yet.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clear that we don't use fnmatch() anymore.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
If a file or directory that we are trying to remove disappears (e.g.,
because another process has pruned it), do not consider it an error.
However, if REMOVE_DIR_KEEP_TOPLEVEL is set, and the toplevel
directory is missing, then consider it an error (like before).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If opendir() fails on the top-level directory, it makes sense to try
to delete it anyway--but only if the failure was due to EACCES.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
directory_exists_in_index_icase() dangerously assumed that it could
access one character beyond the end of its directory argument, and that
that character would unconditionally be '/'. 2eac2a4c (ls-files -k: a
directory only can be killed if the index has a non-directory,
2013-08-15) added a caller which did not respect this undocumented
assumption, and 680be044 (dir.c::test_one_path(): work around
directory_exists_in_index_icase() breakage, 2013-08-23) added a
work-around which temporarily appends a '/' before invoking
directory_exists_in_index_icase().
Since the dangerous behavior of directory_exists_in_index_icase() has
been eliminated, the work-around is now redundant, so retire it (but not
the tests added by the same commit).
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.
2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the (now) redundant trailing '/',
thus callers continue to bear the burden of ensuring the slash's
presence before searching the index for a directory. Eliminate this
redundancy by storing paths in the dir-hash without the trailing '/'.
An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.
Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-files -k" needs to crawl only the part of the working tree
that may overlap the paths in the index to find killed files, but
shared code with the logic to find all the untracked files, which
made it unnecessarily inefficient.
* jc/ls-files-killed-optim:
dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
t3010: update to demonstrate "ls-files -k" optimization pitfalls
ls-files -k: a directory only can be killed if the index has a non-directory
dir.c: use the cache_* macro to access the current index
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
directory_exists_in_index() takes pathname and its length, but its
helper function directory_exists_in_index_icase() reads one byte
beyond the end of the pathname and expects there to be a '/'.
This needs to be fixed, as that one-byte-beyond-the-end location may
not even be readable, possibly by not registering directories to
name hashes with trailing slashes. In the meantime, update the new
caller added recently to treat_one_path() to make sure that the path
buffer it gives the function is one byte longer than the path it is
asking the function about by appending a slash to it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"ls-files -o" and "ls-files -k" both traverse the working tree down
to find either all untracked paths or those that will be "killed"
(removed from the working tree to make room) when the paths recorded
in the index are checked out. It is necessary to traverse the
working tree fully when enumerating all the "other" paths, but when
we are only interested in "killed" paths, we can take advantage of
the fact that paths that do not overlap with entries in the index
can never be killed.
The treat_one_path() helper function, which is called during the
recursive traversal, is the ideal place to implement an
optimization.
When we are looking at a directory P in the working tree, there are
three cases:
(1) P exists in the index. Everything inside the directory P in
the working tree needs to go when P is checked out from the
index.
(2) P does not exist in the index, but there is P/Q in the index.
We know P will stay a directory when we check out the contents
of the index, but we do not know yet if there is a directory
P/Q in the working tree to be killed, so we need to recurse.
(3) P does not exist in the index, and there is no P/Q in the index
to require P to be a directory, either. Only in this case, we
know that everything inside P will not be killed without
recursing.
Note that this helper is called by treat_leading_path() that decides
if we need to traverse only subdirectories of a single common
leading directory, which is essential for this optimization to be
correct. This caller checks each level of the leading path
component from shallower directory to deeper ones, and that is what
allows us to only check if the path appears in the index. If the
call to treat_one_path() weren't there, given a path P/Q/R, the real
traversal may start from directory P/Q/R, even when the index
records P as a regular file, and we would end up having to check if
any leading subpath in P/Q/R, e.g. P, appears in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These codepaths always start from the_index and use index_*
functions, but there is no reason to do so. Use the compatibility
cache_* macro to access the current in-core index like everybody
else.
While at it, fix typo in the comment for a function to check if a
path within a directory appears in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.
With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:
- make everything literal by default via --noglob-pathspecs
--literal-pathspecs cannot be used for this purpose as it
disables _all_ pathspec magic.
- individually turn on globbing with :(glob)
- make everything globbing by default via --glob-pathspecs
- individually turn off globbing with :(literal)
The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch is essentially no-op. It helps catching new use of this
field though. This field is introduced as an intermediate step for the
pathspec conversion and will be removed eventually. At this stage no
more access sites should be introduced.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth was created to replace match_pathspec (see
61cf282 (pathspec: add match_pathspec_depth() - 2010-12-15). It took
more than two years, but the replacement finally happens :-)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, move free_pathspec() to pathspec.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code now takes advantage of nowildcard_len field.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those
that touch anything in 'struct pathspec' except fields "nr" and
"original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to
help the designers catch unsupported codepaths.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).
This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.
Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.
max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We usually use pathspec_item's match field for pathspec error
reporting. However "match" (or "raw") does not show the magic part,
which will play more important role later on. Preserve exact user
input for reporting.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently to fill a struct pathspec, we do:
const char **paths;
paths = get_pathspec(prefix, argv);
...
init_pathspec(&pathspec, paths);
"paths" can only carry bare strings, which loses information from
command line arguments such as pathspec magic or the prefix part's
length for each argument.
parse_pathspec() is introduced to combine the two calls into one. The
plan is gradually replace all get_pathspec() and init_pathspec() with
parse_pathspec(). get_pathspec() now becomes a thin wrapper of
parse_pathspec().
parse_pathspec() allows the caller to reject the pathspec magics that
it does not support. When a new pathspec magic is introduced, we can
enable it per command after making sure that all underlying code has no
problem with the new magic.
"flags" parameter is currently unused. But it would allow callers to
pass certain instructions to parse_pathspec, for example forcing
literal pathspec when no magic is used.
With the introduction of parse_pathspec, there are now two functions
that can initialize struct pathspec: init_pathspec and
parse_pathspec. Any semantic changes in struct pathspec must be
reflected in both functions. init_pathspec() will be phased out in
favor of parse_pathspec().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the working tree walker encounters a directory, it asks the
function treat_directory() if it should descend into it, show it as
an untracked directory, or do something else. When the directory is
the top of the submodule working tree, we used to say "That is an
untracked directory", which was bogus.
It is an entity that is tracked in the index of the repository we
are looking at, and that is not to be descended into it. Return
path_none, not path_untracked, to report that.
The existing case that path_untracked is returned for a newly
discovered submodule that is not tracked in the index (this only
happens when DIR_NO_GITLINKS option is not used) is unchanged, but
that is exactly because the submodule is not tracked in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 95c6f271 "dir.c: unify is_excluded and is_path_excluded APIs", the
is_excluded API no longer recurses into directories that match an ignore
pattern, and returns the directory's ignored state for all contained paths.
This is OK for normal ignore patterns, i.e. ignoring a directory affects
the entire contents recursively.
Unfortunately, this also "works" for negated ignore patterns ('!dir'), i.e.
the entire contents is "not-ignored" recursively, regardless of ignore
patterns that match the contents directly.
In prep_exclude, skip recursing into a directory only if it is really
ignored (i.e. the ignore pattern is not negated).
Signed-off-by: Karsten Blees <blees@dcon.de>
Tested-by: Øystein Walle <oystwa@gmail.com>
Reviewed-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When $HOME is misconfigured to point at an unreadable directory, we
used to complain and die. This loosens the check.
* jn/config-ignore-inaccessible:
config: allow inaccessible configuration under $HOME
'git-status --ignored' still scans the work tree twice to collect
untracked and ignored files, respectively.
fill_directory / read_directory already supports collecting untracked and
ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED
flag to enable this has some git-add specific side-effects (e.g. it
doesn't recurse into ignored directories, so listing ignored files with
--untracked=all doesn't work).
The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored
files in dir_struct.entries[] (instead of dir_struct.ignored[] as
DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git.
We don't want to break the existing API, so lets introduce a new flag
DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar
to DIR_COLLECT_FILES, but will recurse into sub-directories based on the
other flags as DIR_SHOW_IGNORED does.
In dir.c::read_directory_recursive, add ignored files to either
dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move
the DIR_COLLECT_IGNORED case here so that filling result lists is in a
common place.
In wt-status.c::wt_status_collect_untracked, use the new flag and read
results from dir_struct.ignored[]. Remove the extra fill_directory call.
builtin/check-ignore.c doesn't call fill_directory, setting the git-add
specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity.
Update API documentation to reflect the changes.
Performance: with this patch, 'git-status --ignored' is typically as fast
as 'git-status'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git-status --ignored' recursively scans directories up to three times:
1. To collect untracked files.
2. To collect ignored files.
3. When collecting ignored files, to check that an untracked directory
that potentially contains ignored files doesn't also contain untracked
files (i.e. isn't already listed as untracked).
Let's get rid of case 3 first.
Currently, read_directory_recursive returns a boolean whether a directory
contains the requested files or not (actually, it returns the number of
files, but no caller actually needs that), and DIR_SHOW_IGNORED specifies
what we're looking for.
To be able to test for both untracked and ignored files in a single scan,
we need to return a bit more info, and the result must be independent of
the DIR_SHOW_IGNORED flag.
Reuse the path_treatment enum as return value of read_directory_recursive.
Split path_handled in two separate values path_excluded and path_untracked
that don't change their meaning with the DIR_SHOW_IGNORED flag. We don't
need an extra value path_untracked_and_excluded, as directories with both
untracked and ignored files should be listed as untracked.
Rename path_ignored to path_none for clarity (i.e. "don't treat that path"
in contrast to "the path is ignored and should be treated according to
DIR_SHOW_IGNORED").
Replace enum directory_treatment with path_treatment. That's just another
enum with the same meaning, no need to translate back and forth.
In treat_directory, get rid of the extra read_directory_recursive call and
all the DIR_SHOW_IGNORED-specific code.
In read_directory_recursive, decide whether to dir_add_name path_excluded
or path_untracked paths based on the DIR_SHOW_IGNORED flag.
The return value of read_directory_recursive is the maximum path_treatment
of all files and sub-directories. In the check_only case, abort when we've
reached the most significant value (path_untracked).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Checking if a file is in the index is much faster (hashtable lookup) than
checking if the file is excluded (linear search over exclude patterns).
Skip is_excluded checks for files: move the cache_name_exists check from
treat_file to treat_one_path and return early if the file is tracked.
This can safely be done as all other code paths also return path_ignored
for tracked files, and dir_add_ignored skips tracked files as well.
There's just one line left in treat_file, so move this to treat_one_path
as well.
Here's some performance data for git-status from the linux and WebKit
repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true):
| status | status --ignored
| linux | WebKit | linux | WebKit
-------+-------+--------+-------+---------
before | 0.218 | 1.583 | 0.321 | 2.579
after | 0.156 | 0.988 | 0.202 | 1.279
gain | 1.397 | 1.602 | 1.589 | 2.016
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The is_excluded and is_path_excluded APIs are very similar, except for a
few noteworthy differences:
is_excluded doesn't handle ignored directories, results for paths within
ignored directories are incorrect. This is probably based on the premise
that recursive directory scans should stop at ignored directories, which
is no longer true (in certain cases, read_directory_recursive currently
calls is_excluded *and* is_path_excluded to get correct ignored state).
is_excluded caches parsed .gitignore files of the last directory in struct
dir_struct. If the directory changes, it finds a common parent directory
and is very careful to drop only as much state as necessary. On the other
hand, is_excluded will also read and parse .gitignore files in already
ignored directories, which are completely irrelevant.
is_path_excluded correctly handles ignored directories by checking if any
component in the path is excluded. As it uses is_excluded internally, this
unfortunately forces is_excluded to drop and re-read all .gitignore files,
as there is no common parent directory for the root dir.
is_path_excluded tracks state in a separate struct path_exclude_check,
which is essentially a wrapper of dir_struct with two more fields. However,
as is_path_excluded also modifies dir_struct, it is not possible to e.g.
use multiple path_exclude_check structures with the same dir_struct in
parallel. The additional structure just unnecessarily complicates the API.
Teach is_excluded / prep_exclude about ignored directories: whenever
entering a new directory, first check if the entire directory is excluded.
Remember the excluded state in dir_struct. Don't traverse into already
ignored directories (i.e. don't read irrelevant .gitignore files).
Directories could also be excluded by exclude patterns specified on the
command line or .git/info/exclude, so we cannot simply skip prep_exclude
entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
Move this check to just before actually reading the file.
is_path_excluded is now equivalent to is_excluded, so we can simply
redirect to it (the public API is cleaned up in the next patch).
The performance impact of the additional ignored check per directory is
hardly noticeable when reading directories recursively (e.g. 'git status').
However, performance of git commands using the is_path_excluded API (e.g.
'git ls-files --cached --ignored --exclude-standard') is greatly improved
as this no longer re-reads .gitignore files on each call.
Here's some performance data from the linux and WebKit repos (best of 10
runs on a Debian Linux on SSD, core.preloadIndex=true):
| ls-files -ci | status | status --ignored
| linux | WebKit | linux | WebKit | linux | WebKit
-------+-------+--------+-------+--------+-------+---------
before | 0.506 | 6.539 | 0.212 | 1.555 | 0.323 | 2.541
after | 0.080 | 1.191 | 0.218 | 1.583 | 0.321 | 2.579
gain | 6.325 | 5.490 | 0.972 | 0.982 | 1.006 | 0.985
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>