When we process "foo/" entries in gitignore files on a system
that does not have d_type member in "struct dirent", the earlier
implementation ran lstat(2) separately when matching with
entries that came from the command line, in-tree .gitignore
files, and $GIT_DIR/info/excludes file.
This optimizes it by delaying the lstat(2) call until it becomes
absolutely necessary.
The initial idea for this change was by Jeff King, but I
optimized it further to pass pointers to around.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pattern "foo/" in the exclude list did not match directory
"foo", but a pattern "foo" did. This attempts to extend the
exclude mechanism so that it would while not matching a regular
file or a symbolic link "foo". In order to differentiate a
directory and non directory, this passes down the type of path
being checked to excluded() function.
A downside is that the recursive directory walk may need to run
lstat(2) more often on systems whose "struct dirent" do not give
the type of the entry; earlier it did not have to do so for an
excluded path, but we now need to figure out if a path is a
directory before deciding to exclude it. This is especially bad
because an idea similar to the earlier CE_UPTODATE optimization
to reduce number of lstat(2) calls would by definition not apply
to the codepaths involved, as (1) directories will not be
registered in the index, and (2) excluded paths will not be in
the index anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Operations that walk directories or trees, which potentially need to
consult the .gitignore files, used to always try to open the .gitignore
file every time they entered a new directory, even when they ended up
not needing to call excluded() function to see if a path in the
directory is ignored. This was done by push/pop exclude_per_directory()
functions that managed the data in a stack.
This changes the directory walking API to remove the need to call these
two functions. Instead, the directory walk data structure caches the
data used by excluded() function the last time, and lazily reuses it as
much as possible. Among the data the last check used, the ones from
deeper directories that the path we are checking is outside are
discarded, data from the common leading directories are reused, and then
the directories between the common directory and the directory the path
being checked is in are checked for .gitignore file. This is very
similar to the way gitattributes are handled.
This API change also fixes "ls-files -c -i", which called excluded()
without setting up the gitignore data via the old push/pop functions.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are inconsistencies in the way commands currently handle
the core.excludesfile configuration variable. The problem is
the variable is too new to be noticed by anything other than
git-add and git-status.
* git-ls-files does not notice any of the "ignore" files by
default, as it predates the standardized set of ignore files.
The calling scripts established the convention to use
.git/info/exclude, .gitignore, and later core.excludesfile.
* git-add and git-status know about it because they call
add_excludes_from_file() directly with their own notion of
which standard set of ignore files to use. This is just a
stupid duplication of code that need to be updated every time
the definition of the standard set of ignore files is
changed.
* git-read-tree takes --exclude-per-directory=<gitignore>,
not because the flexibility was needed. Again, this was
because the option predates the standardization of the ignore
files.
* git-merge-recursive uses hardcoded per-directory .gitignore
and nothing else. git-clean (scripted version) does not
honor core.* because its call to underlying ls-files does not
know about it. git-clean in C (parked in 'pu') doesn't either.
We probably could change git-ls-files to use the standard set
when no excludes are specified on the command line and ignore
processing was asked, or something like that, but that will be a
change in semantics and might break people's scripts in a subtle
way. I am somewhat reluctant to make such a change.
On the other hand, I think it makes perfect sense to fix
git-read-tree, git-merge-recursive and git-clean to follow the
same rule as other commands. I do not think of a valid use case
to give an exclude-per-directory that is nonstandard to
read-tree command, outside a "negative" test in the t1004 test
script.
This patch is the first step to untangle this mess.
The next step would be to teach read-tree, merge-recursive and
clean (in C) to use setup_standard_excludes().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Try to avoid a lot of work scanning for excluded files,
by caching some more information when setting up the exclusion
data structure.
Speeds up 'git runstatus' on a repository containing the Qt sources by 30% and
reduces the amount of instructions executed (as measured by valgrind) by a
factor of 2.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a function called remove_empty_dir_recursive() buried
in refs.c. Expose a slightly enhanced version in dir.h: it can now
optionally remove a non-empty directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function get_relative_cwd() works just as getcwd(), only that it
takes an absolute path as additional parameter, returning the prefix
of the current working directory relative to the given path. If the
cwd is no subdirectory of the given path, it returns NULL.
is_inside_dir() is just a trivial wrapper over get_relative_cwd().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the code would always set up the excludes, and then manually
pick through the pathspec we were given, assuming that non-added but
existing paths were just ignored. This was mostly correct, but would
erroneously mark a totally empty directory as 'ignored'.
Instead, we now use the collect_ignored option of dir_struct, which
unambiguously tells us whether a path was ignored. This simplifies the
code, and means empty directories are now just not mentioned at all.
Furthermore, we now conditionally ask dir_struct to respect excludes,
depending on whether the '-f' flag has been set. This means we don't have
to pick through the result, checking for an 'ignored' flag; ignored entries
were either added or not in the first place.
We can safely get rid of the special 'ignored' flags to dir_entry, which
were not used anywhere else.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When set, this option will cause read_directory to keep
track of which entries were ignored. While this shouldn't
effect functionality in most cases, it can make warning
messages to the user much more useful.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the promised cleaned-up version of teaching directory traversal
(ie the "read_directory()" logic) about subprojects. That makes "git add"
understand to add/update subprojects.
It now knows to look at the index file to see if a directory is marked as
a subproject, and use that as information as whether it should be recursed
into or not.
It also generally cleans up the handling of directory entries when
traversing the working tree, by splitting up the decision-making process
into small functions of their own, and adding a fair number of comments.
Finally, it teaches "add_file_to_cache()" that directory names can have
slashes at the end, since the directory traversal adds them to make the
difference between a file and a directory clear (it always did that, but
my previous too-ugly-to-apply subproject patch had a totally different
path for subproject directories and avoided the slash for that case).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The way things are set up, you can now pass a "pathspec" to the
"read_directory()" function. If you pass NULL, it acts exactly
like it used to do (read everything). If you pass a non-NULL
pointer, it will simplify it into a "these are the prefixes
without any special characters", and stop any readdir() early if
the path in question doesn't match any of the prefixes.
NOTE! This does *not* obviate the need for the caller to do the *exact*
pathspec match later. It's a first-level filter on "read_directory()", but
it does not do the full pathspec thing. Maybe it should. But in the
meantime, builtin-add.c really does need to do first
read_directory(dir, .., pathspec);
if (pathspec)
prune_directory(dir, pathspec, baselen);
ie the "prune_directory()" part will do the *exact* pathspec pruning,
while the "read_directory()" will use the pathspec just to do some quick
high-level pruning of the directories it will recurse into.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When '*.ig' is ignored, and you have two files f.ig and d.ig/foo
in the working tree,
$ git add .
correctly ignored f.ig but failed to ignore d.ig/foo. This was
caused by a thinko in an earlier commit 4888c534, when we tried
to allow adding otherwise ignored files.
After reverting that commit, this takes a much simpler approach.
When we have an unmatched pathspec that talks about an existing
pathname, we know it is an ignored path the user tried to add,
so we include it in the set of paths directory walker returned.
This does not let you say "git add -f D" on an ignored directory
D and add everything under D. People can submit a patch to
further allow it if they want to, but I think it is a saner
behaviour to require explicit paths to be spelled out in such a
case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the return value from match_pathspec() so that the
caller can tell cases between exact match, leading pathname
match (i.e. file "foo/bar" matches a pathspec "foo"), or
filename glob match. This can be used to prevent "rm dir" from
removing "dir/file" without explicitly asking for recursive
behaviour with -r flag, for example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This follows up commit ed93b449 where we removed overcautious
"working file will be lost" check.
A new option "--exclude-per-directory=.gitignore" can be used to
tell the "git-read-tree" command that the user does not mind
losing contents in untracked files in the working tree, if they
need to be overwritten by a merge (either a two-way "switch
branches" merge, or a three-way merge).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This creates a new git-runstatus which should do roughly the same thing
as the run_status function from git-commit.sh. Except for color support,
the main focus has been to keep the output identical, so that it can be
verified as correct and then used as a C platform for other improvements to
the status printing code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This moves the code to add the per-directory ignore files for the base
directory into the library routine.
That not only allows us to turn the function push_exclude_per_directory()
static again, it also simplifies the library interface a lot (the caller
no longer needs to worry about any of the per-directory exclude files at
all).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This moves the core directory traversal and filename exclusion logic
into the general git library, making it available for other users
directly.
If we ever want to do "git commit" or "git add" as a built-in (and we
do), we want to be able to handle most of git-ls-files as a library.
NOTE! Not all of git-ls-files is libified by this. The index matching
and pathspec prefix calculation is still in ls-files.c, but this is a
big part of it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>