1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-14 21:23:03 +01:00
Commit graph

113 commits

Author SHA1 Message Date
Junio C Hamano
4197361e39 Merge branch 'mg/more-textconv'
Make "git grep" and "git show" pay attention to --textconv when
dealing with blob objects.

* mg/more-textconv:
  grep: honor --textconv for the case rev:path
  grep: allow to use textconv filters
  t7008: demonstrate behavior of grep with textconv
  cat-file: do not die on --textconv without textconv filters
  show: honor --textconv for blobs
  diff_opt: track whether flags have been set explicitly
  t4030: demonstrate behavior of show with textconv
2013-10-23 13:21:31 -07:00
Junio C Hamano
b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Stefan Beller
d5d09d4754 Replace deprecated OPT_BOOLEAN by OPT_BOOL
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.

This patch does not change semantics of any command intentionally.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 11:32:19 -07:00
Junio C Hamano
988f98f61f Merge branch 'jx/clean-interactive'
Add "interactive" mode to "git clean".

The early part to refactor relative path related helper functions
looked sensible.

* jx/clean-interactive:
  test: run testcases with POSIX absolute paths on Windows
  test: add t7301 for git-clean--interactive
  git-clean: add documentation for interactive git-clean
  git-clean: add ask each interactive action
  git-clean: add select by numbers interactive action
  git-clean: add filter by pattern interactive action
  git-clean: use a git-add-interactive compatible UI
  git-clean: add colors to interactive git-clean
  git-clean: show items of del_list in columns
  git-clean: add support for -i/--interactive
  git-clean: refactor git-clean into two phases
  write_name{_quoted_relative,}(): remove redundant parameters
  quote_path_relative(): remove redundant parameter
  quote.c: substitute path_relative with relative_path
  path.c: refactor relative_path(), not only strip prefix
  test: add test cases for relative_path
2013-07-22 11:24:11 -07:00
Nguyễn Thái Ngọc Duy
7327d3d1b7 convert {read,fill}_directory to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy
6330a17199 parse_pathspec: add special flag for max_depth feature
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).

This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.

Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.

max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
0fdc2ae512 convert some get_pathspec() calls to parse_pathspec()
These call sites follow the pattern:

   paths = get_pathspec(prefix, argv);
   init_pathspec(&pathspec, paths);

which can be converted into a single parse_pathspec() call.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
64acde94ef move struct pathspec and related functions to pathspec.[ch]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Jiang Xin
39598f9983 quote_path_relative(): remove redundant parameter
quote_path_relative() used to take a counted string as its parameter
(the string to be quoted).  With an earlier change, it now uses
relative_path() that does not take a counted string, and we have
been passing only the pointer to the string since then.

Remove the length parameter from quote_path_relative() to show that
this parameter was redundant.  All the changed lines show that the
caller passed either -1 (to ask the function run strlen() on the
string), or the length of the string, so the earlier conversion was
safe.

All the callers of quote_path_relative() that used to take counted string
have been audited to make sure that they are passing length of the actual
string (or -1 to ask the callee run strlen())

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-26 11:16:48 -07:00
Michael J Gruber
afa15f3cd8 grep: honor --textconv for the case rev:path
Make "grep" honor the "--textconv" option also for the object case, i.e.
when used with an argument "rev:path".

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10 10:27:34 -07:00
Jeff King
335ec3bf41 grep: allow to use textconv filters
Recently and not so recently, we made sure that log/grep type operations
use textconv filters when a userfacing diff would do the same:

ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28)
b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28)
0508fe5 (combine-diff: respect textconv attributes, 2011-05-23)

"git grep" currently does not use textconv filters at all, that is
neither for displaying the match and context nor for the actual grepping,
even when requested by --textconv.

Introduce an option "--textconv" which makes git grep use any configured
textconv filters for grepping and output purposes. It is off by default.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10 10:27:31 -07:00
Junio C Hamano
870987dec7 Merge branch 'jk/fully-peeled-packed-ref'
Not that we do not actively encourage having annotated tags outside
refs/tags/ hierarchy, but they were not advertised correctly to the
ls-remote and fetch with recent version of Git.

* jk/fully-peeled-packed-ref:
  pack-refs: add fully-peeled trait
  pack-refs: write peeled entry for non-tags
  use parse_object_or_die instead of die("bad object")
  avoid segfaults on parse_object failure
2013-03-25 14:01:07 -07:00
Jeff King
f7892d1817 use parse_object_or_die instead of die("bad object")
Some call-sites do:

  o = parse_object(sha1);
  if (!o)
	  die("bad object %s", some_name);

We can now handle that as a one-liner, and get more
consistent output.

In the third case of this patch, it looks like we are losing
information, as the existing message also outputs the sha1
hex; however, parse_object will already have written a more
specific complaint about the sha1, so there is no point in
repeating it here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-17 12:52:14 -07:00
Nguyễn Thái Ngọc Duy
0b0ecaac2a grep: avoid accepting ambiguous revision
Unlike other commands that take both revs and pathspecs without "--"
disamiguators only when the boundary is clear, "git grep" treated
what can be interpreted as a rev as-is, without making sure that it
could also have meant a pathspec.  E.g.

    $ git grep -e foo master

when 'master' is in the working tree, should have triggered an
ambiguity error, but it didn't, and searched in the tree of the
commit named by 'master'.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-21 16:57:38 -08:00
Jeff King
e034d1bb92 Merge branch 'nd/grep-true-path'
"git grep -e pattern <tree>" asked the attribute system to read
"<tree>:.gitattributes" file in the working tree, which was
nonsense.

* nd/grep-true-path:
  grep: stop looking at random places for .gitattributes
2012-10-29 04:13:16 -04:00
Nguyễn Thái Ngọc Duy
55c61688ea grep: stop looking at random places for .gitattributes
grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", and Git looks for
.gitattributes in the (non-existent but exploitable) path "rev:path"
instead of "path".

This patch passes real paths down to grep_source_load_driver() when:

 - grep on work tree
 - grep on the index
 - grep a commit (or a tag if it points to a commit)

so that these cases look up .gitattributes at proper paths.
.gitattributes lookup is disabled in all other cases.

Initial-work-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-12 08:24:44 -07:00
Junio C Hamano
c5c31d3381 grep: move pattern-type bits support to top-level grep.[ch]
Switching between -E/-G/-P/-F correctly needs a lot more than just
flipping opt->regflags bit these days, and we have a nice helper
function buried in builtin/grep.c for the sole use of "git grep".

Extract it so that "log --grep" family can also use it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 23:21:29 -07:00
Junio C Hamano
7687a0541e grep: move the configuration parsing logic to grep.[ch]
The configuration handling is a library-ish part of this program,
that is not specific to "git grep" command.  It should be reusable
by "log" and others.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 16:17:50 -07:00
Junio C Hamano
15fabd1bbd builtin/grep.c: make configuration callback more reusable
The grep_config() function takes one instance of grep_opt as its
callback parameter, and populates it by running git_config().

This has three practical implications:

 - You have to have an instance of grep_opt already when you call
   the configuration, but that is not necessarily always true.  You
   may be trying to initialize the grep_filter member of rev_info,
   but are not ready to call init_revisions() on it yet.

 - It is not easy to enhance grep_config() in such a way to make it
   cascade to other callback functions to grab other variables in
   one call of git_config(); grep_config() can be cascaded into from
   other callbacks, but it has to be at the leaf level of a cascade.

 - If you ever need to use more than one instance of grep_opt, you
   will have to open and read the configuration file(s) every time
   you initialize them.

Rearrange the configuration mechanism and model it after how diff
configuration variables are handled.  An early call to git_config()
reads and remembers the values taken from the configuration in the
default "template", and a separate call to grep_init() uses this
template to instantiate a grep_opt.

The next step will be to move some of this out of this file so that
the other user of the grep machinery (i.e. "log") can use it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09 16:04:12 -07:00
Junio C Hamano
31d69db340 Merge branch 'jc/maint-log-grep-all-match-1' into maint
* jc/maint-log-grep-all-match-1:
  grep.c: make two symbols really file-scope static this time
  t7810-grep: test --all-match with multiple --grep and --author options
  t7810-grep: test interaction of multiple --grep and --author options
  t7810-grep: test multiple --author with --all-match
  t7810-grep: test multiple --grep with and without --all-match
  t7810-grep: bring log --grep tests in common form
  grep.c: mark private file-scope symbols as static
  log: document use of multiple commit limiting options
  log --grep/--author: honor --all-match honored for multiple --grep patterns
  grep: show --debug output only once
  grep: teach --debug option to dump the parse tree
2012-09-29 22:30:56 -07:00
Junio C Hamano
3d7535e424 Merge branch 'jc/maint-log-grep-all-match'
Fix a long-standing bug in "git log --grep" when multiple "--grep"
are used together with "--all-match" and "--author" or "--committer".

* jc/maint-log-grep-all-match:
  t7810-grep: test --all-match with multiple --grep and --author options
  t7810-grep: test interaction of multiple --grep and --author options
  t7810-grep: test multiple --author with --all-match
  t7810-grep: test multiple --grep with and without --all-match
  t7810-grep: bring log --grep tests in common form
  grep.c: mark private file-scope symbols as static
  log: document use of multiple commit limiting options
  log --grep/--author: honor --all-match honored for multiple --grep patterns
  grep: show --debug output only once
  grep: teach --debug option to dump the parse tree
2012-09-18 14:37:54 -07:00
Michael J Gruber
208f5aa426 grep: show --debug output only once
When threaded grep is in effect, the patterns are duplicated and
recompiled for each thread. Avoid "--debug" output during the
recompilation so that the output is given once instead of "1+nthreads"
times.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14 10:11:44 -07:00
Junio C Hamano
17bf35a3c7 grep: teach --debug option to dump the parse tree
Our "grep" allows complex boolean expressions to be formed to match
each individual line with operators like --and, '(', ')' and --not.
Introduce the "--debug" option to show the parse tree to help people
who want to debug and enhance it.

Also "log" learns "--grep-debug" option to do the same.  The command
line parser to the log family is a lot more limited than the general
"git grep" parser, but it has special handling for header matching
(e.g. "--author"), and a parse tree is valuable when working on it.

Note that "--all-match" is *not* any individual node in the parse
tree.  It is an instruction to the evaluator to check all the nodes
in the top-level backbone have matched and reject a document as
non-matching otherwise.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14 10:10:35 -07:00
Junio C Hamano
096bbd6537 Merge branch 'nd/i18n-parseopt-help'
A lot of i18n mark-up for the help text from "git <cmd> -h".

* nd/i18n-parseopt-help: (66 commits)
  Use imperative form in help usage to describe an action
  Reduce translations by using same terminologies
  i18n: write-tree: mark parseopt strings for translation
  i18n: verify-tag: mark parseopt strings for translation
  i18n: verify-pack: mark parseopt strings for translation
  i18n: update-server-info: mark parseopt strings for translation
  i18n: update-ref: mark parseopt strings for translation
  i18n: update-index: mark parseopt strings for translation
  i18n: tag: mark parseopt strings for translation
  i18n: symbolic-ref: mark parseopt strings for translation
  i18n: show-ref: mark parseopt strings for translation
  i18n: show-branch: mark parseopt strings for translation
  i18n: shortlog: mark parseopt strings for translation
  i18n: rm: mark parseopt strings for translation
  i18n: revert, cherry-pick: mark parseopt strings for translation
  i18n: rev-parse: mark parseopt strings for translation
  i18n: reset: mark parseopt strings for translation
  i18n: rerere: mark parseopt strings for translation
  i18n: status: mark parseopt strings for translation
  i18n: replace: mark parseopt strings for translation
  ...
2012-09-07 11:09:09 -07:00
Nguyễn Thái Ngọc Duy
f63cf8c9fb Use imperative form in help usage to describe an action
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-22 12:02:28 -07:00
Nguyễn Thái Ngọc Duy
4b407bc506 i18n: grep: mark parseopt strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-20 12:23:17 -07:00
J Smith
84befcd0a4 grep: add a grep.patternType configuration setting
The grep.extendedRegexp configuration setting enables the -E flag on grep
by default but there are no equivalents for the -G, -F and -P flags.

Rather than adding an additional setting for grep.fooRegexp for current
and future pattern matching options, add a grep.patternType setting that
can accept appropriate values for modifying the default grep pattern
matching behavior. The current values are "basic", "extended", "fixed",
"perl" and "default" for setting -G, -E, -F, -P and the default behavior
respectively.

When grep.patternType is set to a value other than "default", the
grep.extendedRegexp setting is ignored. The value of "default" restores
the current default behavior, including the grep.extendedRegexp
behavior.

Signed-off-by: J Smith <dark.panda@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03 09:58:02 -07:00
Junio C Hamano
9ca724933a Merge branch 'mm/verify-filename-fix' into maint
"git diff COPYING HEAD:COPYING" gave a nonsense error message that
claimed that the treeish HEAD did not have COPYING in it.

* mm/verify-filename-fix:
  verify_filename(): ask the caller to chose the kind of diagnosis
  sha1_name: do not trigger detailed diagnosis for file arguments
2012-07-11 12:45:49 -07:00
Junio C Hamano
08080894b7 Merge branch 'mm/verify-filename-fix'
"git diff COPYING HEAD:COPYING" gave a nonsense error message that
claimed that the treeish HEAD did not have COPYING in it.
2012-06-28 15:19:32 -07:00
Matthieu Moy
023e37c377 verify_filename(): ask the caller to chose the kind of diagnosis
verify_filename() can be called in two different contexts. Either we
just tried to interpret a string as an object name, and it fails, so
we try looking for a working tree file (i.e. we finished looking at
revs that come earlier on the command line, and the next argument
must be a pathname), or we _know_ that we are looking for a
pathname, and shouldn't even try interpreting the string as an
object name.

For example, with this change, we get:

  $ git log COPYING HEAD:inexistant
  fatal: HEAD:inexistant: no such path in the working tree.
  Use '-- <path>...' to specify paths that do not exist locally.
  $ git log HEAD:inexistant
  fatal: Path 'inexistant' does not exist in 'HEAD'

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-18 15:21:42 -07:00
Junio C Hamano
2147cb2762 Merge branch 'rs/maint-grep-F' into maint
"git grep -e '$pattern'", unlike the case where the patterns are read from
a file, did not treat individual lines in the given pattern argument as
separate regular expressions as it should.

By René Scharfe
* rs/maint-grep-F:
  grep: stop leaking line strings with -f
  grep: support newline separated pattern list
  grep: factor out do_append_grep_pat()
  grep: factor out create_grep_pat()
2012-06-01 13:01:41 -07:00
Junio C Hamano
fca9e0013e Merge branch 'rs/maint-grep-F'
"git grep -e '$pattern'", unlike the case where the patterns are read from
a file, did not treat individual lines in the given pattern argument as
separate regular expressions as it should.
2012-05-25 12:04:19 -07:00
René Scharfe
ec83061156 grep: stop leaking line strings with -f
When reading patterns from a file, we pass the lines as allocated string
buffers to append_grep_pat() and never free them.  That's not a problem
because they are needed until the program ends anyway.

However, now that the function duplicates the pattern string, we can
reuse the strbuf after calling that function.  This simplifies the code
a bit and plugs a minor memory leak.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-21 15:02:08 -07:00
Junio C Hamano
66b8800e53 Merge branch 'rs/no-no-no-parseopt'
* rs/no-no-no-parseopt:
  parse-options: remove PARSE_OPT_NEGHELP
  parse-options: allow positivation of options starting, with no-
  test-parse-options: convert to OPT_BOOL()

Conflicts:
	builtin/grep.c
2012-03-01 20:59:31 -08:00
René Scharfe
cbb08c2e0b parse-options: remove PARSE_OPT_NEGHELP
PARSE_OPT_NEGHELP is confusing because short options defined with that
flag do the opposite of what the helptext says. It is also not needed
anymore now that options starting with no- can be negated by removing
that prefix. Convert its only two users to OPT_NEGBIT() and OPT_BOOL()
and then remove support for PARSE_OPT_NEGHELP.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-28 11:48:11 -08:00
Junio C Hamano
10439fc0ef Merge branch 'jk/grep-binary-attribute'
* jk/grep-binary-attribute:
  grep: pre-load userdiff drivers when threaded
  grep: load file data after checking binary-ness
  grep: respect diff attributes for binary-ness
  grep: cache userdiff_driver in grep_source
  grep: drop grep_buffer's "name" parameter
  convert git-grep to use grep_source interface
  grep: refactor the concept of "grep source" into an object
  grep: move sha1-reading mutex into low-level code
  grep: make locking flag global
2012-02-14 12:57:18 -08:00
Jeff King
6680a0874f drop odd return value semantics from userdiff_config
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.

This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.

Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.

We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.

There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 10:44:54 -08:00
Jeff King
9dd5245c10 grep: pre-load userdiff drivers when threaded
The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.

Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.

We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:

  real    0m0.391s
  user    0m1.708s
  sys     0m0.584s

to:

  real    0m0.360s
  user    0m1.576s
  sys     0m0.572s

Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
8f24a6323e convert git-grep to use grep_source interface
The grep_source interface (as opposed to grep_buffer) will
eventually gives us a richer interface for telling the
low-level grep code about our buffers. Eventually this will
lead to things like better binary-file handling. For now, it
lets us drop a lot of now-redundant code.

The conversion is mostly straight-forward. One thing to note
is that the memory ownership rules for "struct grep_source"
are different than the "struct work_item" found here (the
former will copy things like the filename, rather than
taking ownership). Therefore you will also see some slight
tweaking of when filename buffers are released.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
b3aeb285d0 grep: move sha1-reading mutex into low-level code
The multi-threaded git-grep code needs to serialize access
to the thread-unsafe read_sha1_file call. It does this with
a mutex that is local to builtin/grep.c.

Let's instead push this down into grep.c, where it can be
used by both builtin/grep.c and grep.c. This will let us
safely teach the low-level grep.c code tricks that involve
reading from the object db.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jeff King
78db6ea9dc grep: make locking flag global
The low-level grep code traditionally didn't care about
threading, as it doesn't do any threading itself and didn't
call out to other non-thread-safe code.  That changed with
0579f91 (grep: enable threading with -p and -W using lazy
attribute lookup, 2011-12-12), which pushed the lookup of
funcname attributes (which is not thread-safe) into the
low-level grep code.

As a result, the low-level code learned about a new global
"grep_attr_mutex" to serialize access to the attribute code.
A multi-threaded caller (e.g., builtin/grep.c) is expected
to initialize the mutex and set "use_threads" in the
grep_opt structure. The low-level code only uses the lock if
use_threads is set.

However, putting the use_threads flag into the grep_opt
struct is not the most logical place. Whether threading is
in use is not something that matters for each call to
grep_buffer, but is instead global to the whole program
(i.e., if any thread is doing multi-threaded grep, every
other thread, even if it thinks it is doing its own
single-threaded grep, would need to use the locking).  In
practice, this distinction isn't a problem for us, because
the only user of multi-threaded grep is "git-grep", which
does nothing except call grep.

This patch turns the opt->use_threads flag into a global
flag. More important than the nit-picking semantic argument
above is that this means that the locking functions don't
need to actually have access to a grep_opt to know whether
to lock. Which in turn can make adding new locks simpler, as
we don't need to pass around a grep_opt.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Albert Yale
50dd0f2fd9 grep: fix -l/-L interaction with decoration lines
In threaded mode, git-grep emits file breaks (enabled with context, -W
and --break) into the accumulation buffers even if they are not
required.  The output collection thread then uses skip_first_line to
skip the first such line in the output, which would otherwise be at
the very top.

This is wrong when the user also specified -l/-L/-c, in which case
every line is relevant.  While arguably giving these options together
doesn't make any sense, git-grep has always quietly accepted it.  So
do not skip anything in these cases.

Signed-off-by: Albert Yale <surfingalbert@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-23 10:49:34 -08:00
Thomas Rast
53b8d931b6 grep: disable threading in non-worktree case
Measurements by various people have shown that grepping in parallel is
not beneficial when the object store is involved.  For example, with a
simple regex:

  Threads     | --cached case            | worktree case
  ----------------------------------------------------------------
  8 (default) | 2.88u 0.21s 0:02.94real  | 0.19u 0.32s 0:00.16real
  4           | 2.89u 0.29s 0:02.99real  | 0.16u 0.34s 0:00.17real
  2           | 2.83u 0.36s 0:02.87real  | 0.18u 0.32s 0:00.26real
  NO_PTHREADS | 2.16u 0.08s 0:02.25real  | 0.12u 0.17s 0:00.31real

This happens because all the threads contend on read_sha1_mutex almost
all of the time.  A more complex regex allows the threads to do more
work in parallel, but as Jeff King found out, the "super boost" (much
higher clock when only one core is active) feature of recent CPUs
still causes the unthreaded case to win by a large margin.

So until the pack machinery allows unthreaded access, we disable
grep's threading in all but the worktree case.

Helped-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16 15:47:25 -08:00
Thomas Rast
0579f91dd7 grep: enable threading with -p and -W using lazy attribute lookup
Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16 15:47:10 -08:00
Junio C Hamano
62cdb6b23a Merge branch 'nd/misc-cleanups'
* nd/misc-cleanups:
  unpack_object_header_buffer(): clear the size field upon error
  tree_entry_interesting: make use of local pointer "item"
  tree_entry_interesting(): give meaningful names to return values
  read_directory_recursive: reduce one indentation level
  get_tree_entry(): do not call find_tree_entry() on an empty tree
  tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-05 15:10:20 -08:00
Nguyễn Thái Ngọc Duy
d688cf07b1 tree_entry_interesting(): give meaningful names to return values
It is a basic code hygiene to avoid magic constants that are unnamed.
Besides, this helps extending the value later on for "interesting, but
cannot decide if the entry truely matches yet" (ie. prefix matches)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:38:24 -07:00
Nguyễn Thái Ngọc Duy
0de1633783 tree-walk.c: do not leak internal structure in tree_entry_len()
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.

Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:08:26 -07:00
Junio C Hamano
764161391f builtin/grep: simplify lock_and_read_sha1_file()
As read_sha1_lock/unlock have been made aware of use_threads,
this caller can be made a lot simpler.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 13:09:23 -07:00
Junio C Hamano
1487a12ba2 builtin/grep: make lock/unlock into static inline functions
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 13:09:04 -07:00