mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-05 08:47:56 +01:00

1798 lines

41 KiB

C

Raw Normal View History

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`#include "cache.h"`
			`#include "grep.h"`
grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`#include "userdiff.h"`
Move buffer_is_binary() to xdiff-interface.h We already have two instances where we want to determine if a buffer contains binary data as opposed to text. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-05 04:36:11 +02:00			`#include "xdiff-interface.h"`
grep: allow to use textconv filters Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-10 17:10:15 +02:00			`#include "diff.h"`
			`#include "diffcore.h"`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
grep.c: mark private file-scope symbols as static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-15 23:04:36 +02:00			`static int grep_source_load(struct grep_source *gs);`
			`static int grep_source_is_binary(struct grep_source *gs);`

grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`static struct grep_opt grep_defaults;`

			`/*`
			`* Initialize the grep_defaults template with hardcoded defaults.`
			`* We could let the compiler do this, but without C99 initializers`
			`* the code gets unwieldy and unreadable, so...`
			`*/`
			`void init_grep_defaults(void)`
			`{`
			`struct grep_opt *opt = &grep_defaults;`
revisions: initialize revs->grep_filter using grep_init() Instead of using the hand-rolled initialization sequence, use grep_init() to populate the necessary bits. This opens the door to allow the calling commands to optionally read grep.* configuration variables via git_config() if they want to. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:40:03 +02:00			`static int run_once;`

			`if (run_once)`
			`return;`
			`run_once++;`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00
			`memset(opt, 0, sizeof(*opt));`
			`opt->relative = 1;`
			`opt->pathname = 1;`
			`opt->regflags = REG_NEWLINE;`
			`opt->max_depth = -1;`
			`opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;`
			`opt->extended_regexp_option = 0;`
color: add color_set helper for copying raw colors To set up default colors, we sometimes strcpy() from the default string literals into our color buffers. This isn't a bug (assuming the destination is COLOR_MAXLEN bytes), but makes it harder to audit the code for problematic strcpy calls. Let's introduce a color_set which copies under the assumption that there are COLOR_MAXLEN bytes in the destination (of course you can call it on a smaller buffer, so this isn't providing a huge amount of safety, but it's more convenient than calling xsnprintf yourself). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-24 23:08:21 +02:00			`color_set(opt->color_context, "");`
			`color_set(opt->color_filename, "");`
			`color_set(opt->color_function, "");`
			`color_set(opt->color_lineno, "");`
			`color_set(opt->color_match_context, GIT_COLOR_BOLD_RED);`
			`color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);`
			`color_set(opt->color_selected, "");`
			`color_set(opt->color_sep, GIT_COLOR_CYAN);`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`opt->color = -1;`
			`}`

			`static int parse_pattern_type_arg(const char opt, const char arg)`
			`{`
			`if (!strcmp(arg, "default"))`
			`return GREP_PATTERN_TYPE_UNSPECIFIED;`
			`else if (!strcmp(arg, "basic"))`
			`return GREP_PATTERN_TYPE_BRE;`
			`else if (!strcmp(arg, "extended"))`
			`return GREP_PATTERN_TYPE_ERE;`
			`else if (!strcmp(arg, "fixed"))`
			`return GREP_PATTERN_TYPE_FIXED;`
			`else if (!strcmp(arg, "perl"))`
			`return GREP_PATTERN_TYPE_PCRE;`
			`die("bad %s argument: %s", opt, arg);`
			`}`

			`/*`
			`* Read the configuration file once and store it in`
			`* the grep_defaults template.`
			`*/`
			`int grep_config(const char var, const char value, void *cb)`
			`{`
			`struct grep_opt *opt = &grep_defaults;`
			`char *color = NULL;`

			`if (userdiff_config(var, value) < 0)`
			`return -1;`

			`if (!strcmp(var, "grep.extendedregexp")) {`
			`if (git_config_bool(var, value))`
			`opt->extended_regexp_option = 1;`
			`else`
			`opt->extended_regexp_option = 0;`
			`return 0;`
			`}`

			`if (!strcmp(var, "grep.patterntype")) {`
			`opt->pattern_type_option = parse_pattern_type_arg(var, value);`
			`return 0;`
			`}`

			`if (!strcmp(var, "grep.linenumber")) {`
			`opt->linenum = git_config_bool(var, value);`
			`return 0;`
			`}`

grep: add grep.fullName config variable This configuration variable sets the default for the --full-name option. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-17 20:16:05 +01:00			`if (!strcmp(var, "grep.fullname")) {`
			`opt->relative = !git_config_bool(var, value);`
			`return 0;`
			`}`

grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`if (!strcmp(var, "color.grep"))`
			`opt->color = git_config_colorbool(var, value);`
			`else if (!strcmp(var, "color.grep.context"))`
			`color = opt->color_context;`
			`else if (!strcmp(var, "color.grep.filename"))`
			`color = opt->color_filename;`
			`else if (!strcmp(var, "color.grep.function"))`
			`color = opt->color_function;`
			`else if (!strcmp(var, "color.grep.linenumber"))`
			`color = opt->color_lineno;`
grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`else if (!strcmp(var, "color.grep.matchcontext"))`
			`color = opt->color_match_context;`
			`else if (!strcmp(var, "color.grep.matchselected"))`
			`color = opt->color_match_selected;`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`else if (!strcmp(var, "color.grep.selected"))`
			`color = opt->color_selected;`
			`else if (!strcmp(var, "color.grep.separator"))`
			`color = opt->color_sep;`
grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`else if (!strcmp(var, "color.grep.match")) {`
			`int rc = 0;`
			`if (!value)`
			`return config_error_nonbool(var);`
Merge branch 'rs/grep-color-words' Allow painting or not painting (partial) matches in context lines when showing "grep -C<num>" output in color. * rs/grep-color-words: grep: add color.grep.matchcontext and color.grep.matchselected 2014-10-31 19:49:37 +01:00			`rc \|= color_parse(value, opt->color_match_context);`
			`rc \|= color_parse(value, opt->color_match_selected);`
grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`return rc;`
			`}`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00
			`if (color) {`
			`if (!value)`
			`return config_error_nonbool(var);`
color_parse: do not mention variable name in error message Originally the color-parsing function was used only for config variables. It made sense to pass the variable name so that the die() message could be something like: $ git -c color.branch.plain=bogus branch fatal: bad color value 'bogus' for variable 'color.branch.plain' These days we call it in other contexts, and the resulting error messages are a little confusing: $ git log --pretty='%C(bogus)' fatal: bad color value 'bogus' for variable '--pretty format' $ git config --get-color foo.bar bogus fatal: bad color value 'bogus' for variable 'command line' This patch teaches color_parse to complain only about the value, and then return an error code. Config callers can then propagate that up to the config parser, which mentions the variable name. Other callers can provide a custom message. After this patch these three cases now look like: $ git -c color.branch.plain=bogus branch error: invalid color value: bogus fatal: unable to parse 'color.branch.plain' from command-line config $ git log --pretty='%C(bogus)' error: invalid color value: bogus fatal: unable to parse --pretty format $ git config --get-color foo.bar bogus error: invalid color value: bogus fatal: unable to parse default color value Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-07 21:33:09 +02:00			`return color_parse(value, color);`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`}`
			`return 0;`
			`}`

			`/*`
			`* Initialize one instance of grep_opt and copy the`
			`* default values from the template we read the configuration`
			`* information in an earlier call to git_config(grep_config).`
			`*/`
			`void grep_init(struct grep_opt opt, const char prefix)`
			`{`
			`struct grep_opt *def = &grep_defaults;`

			`memset(opt, 0, sizeof(*opt));`
			`opt->prefix = prefix;`
			`opt->prefix_length = (prefix && *prefix) ? strlen(prefix) : 0;`
			`opt->pattern_tail = &opt->pattern_list;`
			`opt->header_tail = &opt->header_list;`

			`opt->color = def->color;`
			`opt->extended_regexp_option = def->extended_regexp_option;`
			`opt->pattern_type_option = def->pattern_type_option;`
			`opt->linenum = def->linenum;`
			`opt->max_depth = def->max_depth;`
			`opt->pathname = def->pathname;`
			`opt->regflags = def->regflags;`
			`opt->relative = def->relative;`

color: add color_set helper for copying raw colors To set up default colors, we sometimes strcpy() from the default string literals into our color buffers. This isn't a bug (assuming the destination is COLOR_MAXLEN bytes), but makes it harder to audit the code for problematic strcpy calls. Let's introduce a color_set which copies under the assumption that there are COLOR_MAXLEN bytes in the destination (of course you can call it on a smaller buffer, so this isn't providing a huge amount of safety, but it's more convenient than calling xsnprintf yourself). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-24 23:08:21 +02:00			`color_set(opt->color_context, def->color_context);`
			`color_set(opt->color_filename, def->color_filename);`
			`color_set(opt->color_function, def->color_function);`
			`color_set(opt->color_lineno, def->color_lineno);`
			`color_set(opt->color_match_context, def->color_match_context);`
			`color_set(opt->color_match_selected, def->color_match_selected);`
			`color_set(opt->color_selected, def->color_selected);`
			`color_set(opt->color_sep, def->color_sep);`
grep: move the configuration parsing logic to grep.[ch] The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-10 01:17:50 +02:00			`}`
grep.c: mark private file-scope symbols as static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-15 23:04:36 +02:00
grep: move pattern-type bits support to top-level grep.[ch] Switching between -E/-G/-P/-F correctly needs a lot more than just flipping opt->regflags bit these days, and we have a nice helper function buried in builtin/grep.c for the sole use of "git grep". Extract it so that "log --grep" family can also use it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-03 23:47:48 +02:00			`void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_opt *opt)`
			`{`
			`if (pattern_type != GREP_PATTERN_TYPE_UNSPECIFIED)`
			`grep_set_pattern_type_option(pattern_type, opt);`
			`else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)`
			`grep_set_pattern_type_option(opt->pattern_type_option, opt);`
			`else if (opt->extended_regexp_option)`
			`grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);`
			`}`

			`void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)`
			`{`
			`switch (pattern_type) {`
			`case GREP_PATTERN_TYPE_UNSPECIFIED:`
			`/* fall through */`

			`case GREP_PATTERN_TYPE_BRE:`
			`opt->fixed = 0;`
			`opt->pcre = 0;`
			`opt->regflags &= ~REG_EXTENDED;`
			`break;`

			`case GREP_PATTERN_TYPE_ERE:`
			`opt->fixed = 0;`
			`opt->pcre = 0;`
			`opt->regflags \|= REG_EXTENDED;`
			`break;`

			`case GREP_PATTERN_TYPE_FIXED:`
			`opt->fixed = 1;`
			`opt->pcre = 0;`
			`opt->regflags &= ~REG_EXTENDED;`
			`break;`

			`case GREP_PATTERN_TYPE_PCRE:`
			`opt->fixed = 0;`
			`opt->pcre = 1;`
			`opt->regflags &= ~REG_EXTENDED;`
			`break;`
			`}`
			`}`

grep: factor out create_grep_pat() Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:39 +02:00			`static struct grep_pat create_grep_pat(const char pat, size_t patlen,`
			`const char *origin, int no,`
			`enum grep_pat_token t,`
			`enum grep_header_field field)`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`{`
			`struct grep_pat p = xcalloc(1, sizeof(p));`
grep: support newline separated pattern list Currently, patterns that contain newline characters don't match anything when given to git grep. Regular grep(1) interprets patterns as lists of newline separated search strings instead. Implement this functionality by creating and inserting extra grep_pat structures for patterns consisting of multiple lines when appending to the pattern lists. For simplicity, all pattern strings are duplicated. The original pattern is truncated in place to make it contain only the first line. Requested-by: Torne (Richard Coles) <torne@google.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:33:07 +02:00			`p->pattern = xmemdupz(pat, patlen);`
grep: factor out create_grep_pat() Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:39 +02:00			`p->patternlen = patlen;`
			`p->origin = origin;`
			`p->no = no;`
			`p->token = t;`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`p->field = field;`
grep: factor out create_grep_pat() Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:39 +02:00			`return p;`
			`}`

grep: factor out do_append_grep_pat() Add do_append_grep_pat() as a shared function for adding patterns to the header pattern list and the general pattern list. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:54 +02:00			`static void do_append_grep_pat(struct grep_pat **tail, struct grep_pat p)`
			`{`
			`**tail = p;`
			`*tail = &p->next;`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`p->next = NULL;`
grep: support newline separated pattern list Currently, patterns that contain newline characters don't match anything when given to git grep. Regular grep(1) interprets patterns as lists of newline separated search strings instead. Implement this functionality by creating and inserting extra grep_pat structures for patterns consisting of multiple lines when appending to the pattern lists. For simplicity, all pattern strings are duplicated. The original pattern is truncated in place to make it contain only the first line. Requested-by: Torne (Richard Coles) <torne@google.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:33:07 +02:00
			`switch (p->token) {`
			`case GREP_PATTERN: /* atom */`
			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
			`for (;;) {`
			`struct grep_pat *new_pat;`
			`size_t len = 0;`
			`char cp = p->pattern + p->patternlen, nl = NULL;`
			`while (++len <= p->patternlen) {`
			`if (*(--cp) == '\n') {`
			`nl = cp;`
			`break;`
			`}`
			`}`
			`if (!nl)`
			`break;`
			`new_pat = create_grep_pat(nl + 1, len - 1, p->origin,`
			`p->no, p->token, p->field);`
			`new_pat->next = p->next;`
			`if (!p->next)`
			`*tail = &new_pat->next;`
			`p->next = new_pat;`
			`*nl = '\0';`
			`p->patternlen -= len;`
			`}`
			`break;`
			`default:`
			`break;`
			`}`
grep: factor out do_append_grep_pat() Add do_append_grep_pat() as a shared function for adding patterns to the header pattern list and the general pattern list. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:54 +02:00			`}`

grep: factor out create_grep_pat() Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:39 +02:00			`void append_header_grep_pattern(struct grep_opt *opt,`
			`enum grep_header_field field, const char *pat)`
			`{`
			`struct grep_pat *p = create_grep_pat(pat, strlen(pat), "header", 0,`
			`GREP_PATTERN_HEAD, field);`
log --grep-reflog: reject the option without -g Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-29 20:59:52 +02:00			`if (field == GREP_HEADER_REFLOG)`
			`opt->use_reflog_filter = 1;`
grep: factor out do_append_grep_pat() Add do_append_grep_pat() as a shared function for adding patterns to the header pattern list and the general pattern list. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:54 +02:00			`do_append_grep_pat(&opt->header_tail, p);`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`}`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`void append_grep_pattern(struct grep_opt opt, const char pat,`
			`const char *origin, int no, enum grep_pat_token t)`
grep: support NUL chars in search strings for -F Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:43:43 +02:00			`{`
			`append_grep_pat(opt, pat, strlen(pat), origin, no, t);`
			`}`

			`void append_grep_pat(struct grep_opt opt, const char pat, size_t patlen,`
			`const char *origin, int no, enum grep_pat_token t)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
grep: factor out create_grep_pat() Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:39 +02:00			`struct grep_pat *p = create_grep_pat(pat, patlen, origin, no, t, 0);`
grep: factor out do_append_grep_pat() Add do_append_grep_pat() as a shared function for adding patterns to the header pattern list and the general pattern list. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:32:54 +02:00			`do_append_grep_pat(&opt->pattern_tail, p);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`struct grep_opt grep_opt_dup(const struct grep_opt opt)`
			`{`
			`struct grep_pat *pat;`
			`struct grep_opt *ret = xmalloc(sizeof(struct grep_opt));`
			`ret = opt;`

			`ret->pattern_list = NULL;`
			`ret->pattern_tail = &ret->pattern_list;`

			`for(pat = opt->pattern_list; pat != NULL; pat = pat->next)`
			`{`
			`if(pat->token == GREP_PATTERN_HEAD)`
			`append_header_grep_pattern(ret, pat->field,`
			`pat->pattern);`
			`else`
grep: support NUL chars in search strings for -F Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:43:43 +02:00			`append_grep_pat(ret, pat->pattern, pat->patternlen,`
			`pat->origin, pat->no, pat->token);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`}`

			`return ret;`
			`}`

grep: Extract compile_regexp_failed() from compile_regexp() This simplifies compile_regexp() a little and allows re-using error handling code. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:04 +02:00			`static NORETURN void compile_regexp_failed(const struct grep_pat *p,`
			`const char *error)`
			`{`
			`char where[1024];`

			`if (p->no)`
grep: use xsnprintf to format failure message This looks at first glance like the sprintf can overflow our buffer, but it's actually fine; the p->origin string is something constant and small, like "command line" or "-e option". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-24 23:06:51 +02:00			`xsnprintf(where, sizeof(where), "In '%s' at %d, ", p->origin, p->no);`
grep: Extract compile_regexp_failed() from compile_regexp() This simplifies compile_regexp() a little and allows re-using error handling code. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:04 +02:00			`else if (p->origin)`
grep: use xsnprintf to format failure message This looks at first glance like the sprintf can overflow our buffer, but it's actually fine; the p->origin string is something constant and small, like "command line" or "-e option". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-24 23:06:51 +02:00			`xsnprintf(where, sizeof(where), "%s, ", p->origin);`
grep: Extract compile_regexp_failed() from compile_regexp() This simplifies compile_regexp() a little and allows re-using error handling code. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:04 +02:00			`else`
			`where[0] = 0;`

			`die("%s'%s': %s", where, p->pattern, error);`
			`}`

git-grep: Learn PCRE This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:05 +02:00			`#ifdef USE_LIBPCRE`
			`static void compile_pcre_regexp(struct grep_pat p, const struct grep_opt opt)`
			`{`
			`const char *error;`
			`int erroffset;`
grep -P: Fix matching ^ and $ When "git grep" is run with -P/--perl-regexp, it doesn't match ^ and $ at the beginning/end of the line. This is because PCRE normally matches ^ and $ at the beginning/end of the whole text, not for each line, and "git grep" passes a large chunk of text (possibly containing many lines) to pcre_exec() and then splits the text into lines. This makes "git grep -P" behave differently from "git grep -E" and also from "grep -P" and "pcregrep": $ cat file a b $ git grep --no-index -P '^ ' file $ git grep --no-index -E '^ ' file file: b $ grep -c -P '^ ' file b $ pcregrep -c '^ ' file b Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-25 10:24:28 +01:00			`int options = PCRE_MULTILINE;`
git-grep: Learn PCRE This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:05 +02:00
			`if (opt->ignore_case)`
			`options \|= PCRE_CASELESS;`

			`p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,`
			`NULL);`
			`if (!p->pcre_regexp)`
			`compile_regexp_failed(p, error);`

			`p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);`
			`if (!p->pcre_extra_info && error)`
			`die("%s", error);`
			`}`

			`static int pcrematch(struct grep_pat p, const char line, const char *eol,`
			`regmatch_t *match, int eflags)`
			`{`
			`int ovector[30], ret, flags = 0;`

			`if (eflags & REG_NOTBOL)`
			`flags \|= PCRE_NOTBOL;`

			`ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,`
			`0, flags, ovector, ARRAY_SIZE(ovector));`
			`if (ret < 0 && ret != PCRE_ERROR_NOMATCH)`
			`die("pcre_exec failed with error code %d", ret);`
			`if (ret > 0) {`
			`ret = 0;`
			`match->rm_so = ovector[0];`
			`match->rm_eo = ovector[1];`
			`}`

			`return ret;`
			`}`

			`static void free_pcre_regexp(struct grep_pat *p)`
			`{`
			`pcre_free(p->pcre_regexp);`
			`pcre_free(p->pcre_extra_info);`
			`}`
			`#else /* !USE_LIBPCRE */`
			`static void compile_pcre_regexp(struct grep_pat p, const struct grep_opt opt)`
			`{`
			`die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");`
			`}`

			`static int pcrematch(struct grep_pat p, const char line, const char *eol,`
			`regmatch_t *match, int eflags)`
			`{`
			`return 1;`
			`}`

			`static void free_pcre_regexp(struct grep_pat *p)`
			`{`
			`}`
			`#endif /* !USE_LIBPCRE */`

Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`static int is_fixed(const char *s, size_t len)`
			`{`
			`size_t i;`

			`/* regcomp cannot accept patterns with NULs so we`
			`* consider any pattern containing a NUL fixed.`
			`*/`
			`if (memchr(s, 0, len))`
			`return 1;`

			`for (i = 0; i < len; i++) {`
			`if (is_regex_special(s[i]))`
			`return 0;`
			`}`

			`return 1;`
			`}`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`static void compile_regexp(struct grep_pat p, struct grep_opt opt)`
			`{`
grep: don't call regexec() for fixed strings Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:18:34 +01:00			`int err;`

grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`p->word_regexp = opt->word_regexp;`
grep: Allow case insensitive search of fixed-strings "git grep" currently an error when you combine the -F and -i flags. This isn't in line with how GNU grep handles it. This patch allows the simultaneous use of those flags. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brian Collins <bricollins@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-11-06 10:22:35 +01:00			`p->ignore_case = opt->ignore_case;`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`if (opt->fixed \|\| is_fixed(p->pattern, p->patternlen))`
			`p->fixed = 1;`
			`else`
			`p->fixed = 0;`

			`if (p->fixed) {`
grep: use static trans-case table In order to prepare the kwset machinery for a case-insensitive search, we used to use a static table of 256 elements and filled it every time before calling kwsalloc(). Because the kwset machinery will never modify this table, just allocate a single instance globally and fill it at the compile time. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-28 23:20:53 +01:00			`if (opt->regflags & REG_ICASE \|\| p->ignore_case)`
			`p->kws = kwsalloc(tolower_trans_tbl);`
			`else`
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`p->kws = kwsalloc(NULL);`
			`kwsincr(p->kws, p->pattern, p->patternlen);`
			`kwsprep(p->kws);`
grep: don't call regexec() for fixed strings Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:18:34 +01:00			`return;`
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`}`
grep: don't call regexec() for fixed strings Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:18:34 +01:00
git-grep: Learn PCRE This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:05 +02:00			`if (opt->pcre) {`
			`compile_pcre_regexp(p, opt);`
			`return;`
			`}`

grep: don't call regexec() for fixed strings Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:18:34 +01:00			`err = regcomp(&p->regexp, p->pattern, opt->regflags);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (err) {`
			`char errbuf[1024];`
			`regerror(err, &p->regexp, errbuf, 1024);`
			`regfree(&p->regexp);`
grep: Extract compile_regexp_failed() from compile_regexp() This simplifies compile_regexp() a little and allows re-using error handling code. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:04 +02:00			`compile_regexp_failed(p, errbuf);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`
			`}`

grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`static struct grep_expr compile_pattern_or(struct grep_pat *);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`static struct grep_expr compile_pattern_atom(struct grep_pat *list)`
			`{`
			`struct grep_pat *p;`
			`struct grep_expr *x;`

			`p = *list;`
grep: fix segfault when "git grep '('" is given Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-27 20:10:24 +02:00			`if (!p)`
			`return NULL;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`switch (p->token) {`
			`case GREP_PATTERN: /* atom */`
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`x = xcalloc(1, sizeof (struct grep_expr));`
			`x->node = GREP_NODE_ATOM;`
			`x->u.atom = p;`
			`*list = p->next;`
			`return x;`
			`case GREP_OPEN_PAREN:`
			`*list = p->next;`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`x = compile_pattern_or(list);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (!list \|\| (list)->token != GREP_CLOSE_PAREN)`
			`die("unmatched parenthesis");`
			`list = (list)->next;`
			`return x;`
			`default:`
			`return NULL;`
			`}`
			`}`

			`static struct grep_expr compile_pattern_not(struct grep_pat *list)`
			`{`
			`struct grep_pat *p;`
			`struct grep_expr *x;`

			`p = *list;`
grep: fix segfault when "git grep '('" is given Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-27 20:10:24 +02:00			`if (!p)`
			`return NULL;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`switch (p->token) {`
			`case GREP_NOT:`
			`if (!p->next)`
			`die("--not not followed by pattern expression");`
			`*list = p->next;`
			`x = xcalloc(1, sizeof (struct grep_expr));`
			`x->node = GREP_NODE_NOT;`
			`x->u.unary = compile_pattern_not(list);`
			`if (!x->u.unary)`
			`die("--not followed by non pattern expression");`
			`return x;`
			`default:`
			`return compile_pattern_atom(list);`
			`}`
			`}`

			`static struct grep_expr compile_pattern_and(struct grep_pat *list)`
			`{`
			`struct grep_pat *p;`
			`struct grep_expr x, y, *z;`

			`x = compile_pattern_not(list);`
			`p = *list;`
			`if (p && p->token == GREP_AND) {`
			`if (!p->next)`
			`die("--and not followed by pattern expression");`
			`*list = p->next;`
			`y = compile_pattern_and(list);`
			`if (!y)`
			`die("--and not followed by pattern expression");`
			`z = xcalloc(1, sizeof (struct grep_expr));`
			`z->node = GREP_NODE_AND;`
			`z->u.binary.left = x;`
			`z->u.binary.right = y;`
			`return z;`
			`}`
			`return x;`
			`}`

			`static struct grep_expr compile_pattern_or(struct grep_pat *list)`
			`{`
			`struct grep_pat *p;`
			`struct grep_expr x, y, *z;`

			`x = compile_pattern_and(list);`
			`p = *list;`
			`if (x && p && p->token != GREP_CLOSE_PAREN) {`
			`y = compile_pattern_or(list);`
			`if (!y)`
			`die("not a pattern expression %s", p->pattern);`
			`z = xcalloc(1, sizeof (struct grep_expr));`
			`z->node = GREP_NODE_OR;`
			`z->u.binary.left = x;`
			`z->u.binary.right = y;`
			`return z;`
			`}`
			`return x;`
			`}`

			`static struct grep_expr compile_pattern_expr(struct grep_pat *list)`
			`{`
			`return compile_pattern_or(list);`
			`}`

grep: teach --debug option to dump the parse tree Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is not any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-13 23:21:44 +02:00			`static void indent(int in)`
			`{`
			`while (in-- > 0)`
			`fputc(' ', stderr);`
			`}`

			`static void dump_grep_pat(struct grep_pat *p)`
			`{`
			`switch (p->token) {`
			`case GREP_AND: fprintf(stderr, "and"); break;`
			`case GREP_OPEN_PAREN: fprintf(stderr, "("); break;`
			`case GREP_CLOSE_PAREN: fprintf(stderr, ")"); break;`
			`case GREP_NOT: fprintf(stderr, "not"); break;`
			`case GREP_OR: fprintf(stderr, "or"); break;`

			`case GREP_PATTERN: fprintf(stderr, "pattern"); break;`
			`case GREP_PATTERN_HEAD: fprintf(stderr, "pattern_head"); break;`
			`case GREP_PATTERN_BODY: fprintf(stderr, "pattern_body"); break;`
			`}`

			`switch (p->token) {`
			`default: break;`
			`case GREP_PATTERN_HEAD:`
			`fprintf(stderr, "<head %d>", p->field); break;`
			`case GREP_PATTERN_BODY:`
			`fprintf(stderr, "<body>"); break;`
			`}`
			`switch (p->token) {`
			`default: break;`
			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
			`case GREP_PATTERN:`
			`fprintf(stderr, "%.*s", (int)p->patternlen, p->pattern);`
			`break;`
			`}`
			`fputc('\n', stderr);`
			`}`

			`static void dump_grep_expression_1(struct grep_expr *x, int in)`
			`{`
			`indent(in);`
			`switch (x->node) {`
			`case GREP_NODE_TRUE:`
			`fprintf(stderr, "true\n");`
			`break;`
			`case GREP_NODE_ATOM:`
			`dump_grep_pat(x->u.atom);`
			`break;`
			`case GREP_NODE_NOT:`
			`fprintf(stderr, "(not\n");`
			`dump_grep_expression_1(x->u.unary, in+1);`
			`indent(in);`
			`fprintf(stderr, ")\n");`
			`break;`
			`case GREP_NODE_AND:`
			`fprintf(stderr, "(and\n");`
			`dump_grep_expression_1(x->u.binary.left, in+1);`
			`dump_grep_expression_1(x->u.binary.right, in+1);`
			`indent(in);`
			`fprintf(stderr, ")\n");`
			`break;`
			`case GREP_NODE_OR:`
			`fprintf(stderr, "(or\n");`
			`dump_grep_expression_1(x->u.binary.left, in+1);`
			`dump_grep_expression_1(x->u.binary.right, in+1);`
			`indent(in);`
			`fprintf(stderr, ")\n");`
			`break;`
			`}`
			`}`

grep.c: mark private file-scope symbols as static Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-15 23:04:36 +02:00			`static void dump_grep_expression(struct grep_opt *opt)`
grep: teach --debug option to dump the parse tree Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is not any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-13 23:21:44 +02:00			`{`
			`struct grep_expr *x = opt->pattern_expression;`

			`if (opt->all_match)`
			`fprintf(stderr, "[all-match]\n");`
			`dump_grep_expression_1(x, 0);`
			`fflush(NULL);`
			`}`

log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`static struct grep_expr *grep_true_expr(void)`
			`{`
			`struct grep_expr z = xcalloc(1, sizeof(z));`
			`z->node = GREP_NODE_TRUE;`
			`return z;`
			`}`

			`static struct grep_expr grep_or_expr(struct grep_expr left, struct grep_expr *right)`
			`{`
			`struct grep_expr z = xcalloc(1, sizeof(z));`
			`z->node = GREP_NODE_OR;`
			`z->u.binary.left = left;`
			`z->u.binary.right = right;`
			`return z;`
			`}`

grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`static struct grep_expr prep_header_patterns(struct grep_opt opt)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
			`struct grep_pat *p;`
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`struct grep_expr *header_expr;`
log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`struct grep_expr *(header_group[GREP_HEADER_FIELD_MAX]);`
			`enum grep_header_field fld;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`if (!opt->header_list)`
			`return NULL;`
grep.c: remove redundant line of code Signed-off-by: Angus Hammond <angusgh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-06 20:17:15 +02:00
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`for (p = opt->header_list; p; p = p->next) {`
			`if (p->token != GREP_PATTERN_HEAD)`
			`die("bug: a non-header pattern in grep header list.");`
fix clang -Wtautological-compare with unsigned enum Create a GREP_HEADER_FIELD_MIN so we can check that the field value is sane and silence the clang warning. Clang warning happens because the enum is unsigned (this is implementation-defined, and there is no negative fields) and the check is then tautological. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-02-03 15:37:09 +01:00			`if (p->field < GREP_HEADER_FIELD_MIN \|\|`
			`GREP_HEADER_FIELD_MAX <= p->field)`
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`die("bug: unknown header field %d", p->field);`
			`compile_regexp(p, opt);`
"log --author=me --grep=it" should find intersection, not union Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 05:09:06 +01:00			`}`
log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00
			`for (fld = 0; fld < GREP_HEADER_FIELD_MAX; fld++)`
			`header_group[fld] = NULL;`

			`for (p = opt->header_list; p; p = p->next) {`
			`struct grep_expr *h;`
			`struct grep_pat *pp = p;`

			`h = compile_pattern_atom(&pp);`
			`if (!h \|\| pp != p->next)`
			`die("bug: malformed header expr");`
			`if (!header_group[p->field]) {`
			`header_group[p->field] = h;`
			`continue;`
			`}`
			`header_group[p->field] = grep_or_expr(h, header_group[p->field]);`
			`}`

			`header_expr = NULL;`

			`for (fld = 0; fld < GREP_HEADER_FIELD_MAX; fld++) {`
			`if (!header_group[fld])`
			`continue;`
			`if (!header_expr)`
			`header_expr = grep_true_expr();`
			`header_expr = grep_or_expr(header_group[fld], header_expr);`
			`}`
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`return header_expr;`
			`}`

log --grep/--author: honor --all-match honored for multiple --grep patterns When we have both header expression (which has to be an OR node by construction) and a pattern expression (which could be anything), we create a new top-level OR node to bind them together, and the resulting expression structure looks like this: OR / \ / \ pattern OR / \ / \ ..... committer OR / \ author TRUE The three elements on the top-level backbone that are inspected by the "all-match" logic are "pattern", "committer" and "author". When there are more than one elements in the "pattern", the top-level node of the "pattern" part of the subtree is an OR, and that node is inspected by "all-match". The result ends up ignoring the "--all-match" given from the command line. A match on either side of the pattern is considered a match, hence: git log --grep=A --grep=B --author=C --all-match shows the same "authored by C and has either A or B" that is correct only when run without "--all-match". Fix this by turning the resulting expression around when "--all-match" is in effect, like this: OR / \ / \ / OR committer / \ author \ pattern The set of nodes on the top-level backbone in the resulting expression becomes "committer", "author", and the nodes that are on the top-level backbone of the "pattern" subexpression. This makes the "all-match" logic inspect the same nodes in "pattern" as the case without the author and/or the committer restriction, and makes the earlier "log" example to show "authored by C and has A and has B", which is what the command line expects. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-14 01:26:57 +02:00			`static struct grep_expr grep_splice_or(struct grep_expr x, struct grep_expr *y)`
			`{`
			`struct grep_expr *z = x;`

			`while (x) {`
			`assert(x->node == GREP_NODE_OR);`
			`if (x->u.binary.right &&`
			`x->u.binary.right->node == GREP_NODE_TRUE) {`
			`x->u.binary.right = y;`
			`break;`
			`}`
			`x = x->u.binary.right;`
			`}`
			`return z;`
			`}`

grep: teach --debug option to dump the parse tree Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is not any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-13 23:21:44 +02:00			`static void compile_grep_patterns_real(struct grep_opt *opt)`
grep: move logic to compile header pattern into a separate helper The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 04:30:48 +02:00			`{`
			`struct grep_pat *p;`
			`struct grep_expr *header_expr = prep_header_patterns(opt);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`for (p = opt->pattern_list; p; p = p->next) {`
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`switch (p->token) {`
			`case GREP_PATTERN: /* atom */`
			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
grep: don't call regexec() for fixed strings Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:18:34 +01:00			`compile_regexp(p, opt);`
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`break;`
			`default:`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`opt->extended = 1;`
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`break;`
			`}`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

"log --author=me --grep=it" should find intersection, not union Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 05:09:06 +01:00			`if (opt->all_match \|\| header_expr)`
			`opt->extended = 1;`
grep: teach --debug option to dump the parse tree Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is not any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-13 23:21:44 +02:00			`else if (!opt->extended && !opt->debug)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return;`

			`p = opt->pattern_list;`
git log: avoid segfault with --all-match Avoid a segfault when the command git log --all-match was issued, by ignoring the option. Signed-off-by: Michele Ballabio <barra_cuda@katamail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-18 21:53:27 +01:00			`if (p)`
			`opt->pattern_expression = compile_pattern_expr(&p);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (p)`
			`die("incomplete pattern expression: %s", p->pattern);`
"log --author=me --grep=it" should find intersection, not union Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 05:09:06 +01:00
			`if (!header_expr)`
			`return;`

log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`if (!opt->pattern_expression)`
"log --author=me --grep=it" should find intersection, not union Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 05:09:06 +01:00			`opt->pattern_expression = header_expr;`
log --grep/--author: honor --all-match honored for multiple --grep patterns When we have both header expression (which has to be an OR node by construction) and a pattern expression (which could be anything), we create a new top-level OR node to bind them together, and the resulting expression structure looks like this: OR / \ / \ pattern OR / \ / \ ..... committer OR / \ author TRUE The three elements on the top-level backbone that are inspected by the "all-match" logic are "pattern", "committer" and "author". When there are more than one elements in the "pattern", the top-level node of the "pattern" part of the subtree is an OR, and that node is inspected by "all-match". The result ends up ignoring the "--all-match" given from the command line. A match on either side of the pattern is considered a match, hence: git log --grep=A --grep=B --author=C --all-match shows the same "authored by C and has either A or B" that is correct only when run without "--all-match". Fix this by turning the resulting expression around when "--all-match" is in effect, like this: OR / \ / \ / OR committer / \ author \ pattern The set of nodes on the top-level backbone in the resulting expression becomes "committer", "author", and the nodes that are on the top-level backbone of the "pattern" subexpression. This makes the "all-match" logic inspect the same nodes in "pattern" as the case without the author and/or the committer restriction, and makes the earlier "log" example to show "authored by C and has A and has B", which is what the command line expects. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-14 01:26:57 +02:00			`else if (opt->all_match)`
			`opt->pattern_expression = grep_splice_or(header_expr,`
			`opt->pattern_expression);`
log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`else`
			`opt->pattern_expression = grep_or_expr(opt->pattern_expression,`
			`header_expr);`
"log --author=me --grep=it" should find intersection, not union Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-18 05:09:06 +01:00			`opt->all_match = 1;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

grep: teach --debug option to dump the parse tree Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is not any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-13 23:21:44 +02:00			`void compile_grep_patterns(struct grep_opt *opt)`
			`{`
			`compile_grep_patterns_real(opt);`
			`if (opt->debug)`
			`dump_grep_expression(opt);`
			`}`

grep: free expressions and patterns when done. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 01:27:10 +02:00			`static void free_pattern_expr(struct grep_expr *x)`
			`{`
			`switch (x->node) {`
log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`case GREP_NODE_TRUE:`
grep: free expressions and patterns when done. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 01:27:10 +02:00			`case GREP_NODE_ATOM:`
			`break;`
			`case GREP_NODE_NOT:`
			`free_pattern_expr(x->u.unary);`
			`break;`
			`case GREP_NODE_AND:`
			`case GREP_NODE_OR:`
			`free_pattern_expr(x->u.binary.left);`
			`free_pattern_expr(x->u.binary.right);`
			`break;`
			`}`
			`free(x);`
			`}`

			`void free_grep_patterns(struct grep_opt *opt)`
			`{`
			`struct grep_pat p, n;`

			`for (p = opt->pattern_list; p; p = n) {`
			`n = p->next;`
			`switch (p->token) {`
			`case GREP_PATTERN: /* atom */`
			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`if (p->kws)`
			`kwsfree(p->kws);`
			`else if (p->pcre_regexp)`
git-grep: Learn PCRE This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:05 +02:00			`free_pcre_regexp(p);`
			`else`
			`regfree(&p->regexp);`
grep: support newline separated pattern list Currently, patterns that contain newline characters don't match anything when given to git grep. Regular grep(1) interprets patterns as lists of newline separated search strings instead. Implement this functionality by creating and inserting extra grep_pat structures for patterns consisting of multiple lines when appending to the pattern lists. For simplicity, all pattern strings are duplicated. The original pattern is truncated in place to make it contain only the first line. Requested-by: Torne (Richard Coles) <torne@google.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-20 16:33:07 +02:00			`free(p->pattern);`
grep: free expressions and patterns when done. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 01:27:10 +02:00			`break;`
			`default:`
			`break;`
			`}`
			`free(p);`
			`}`

			`if (!opt->extended)`
			`return;`
			`free_pattern_expr(opt->pattern_expression);`
			`}`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`static char end_of_line(char cp, unsigned long *left)`
			`{`
			`unsigned long l = *left;`
			`while (l && *cp != '\n') {`
			`l--;`
			`cp++;`
			`}`
			`*left = l;`
			`return cp;`
			`}`

			`static int word_char(char ch)`
			`{`
			`return isalnum(ch) \|\| ch == '_';`
			`}`

grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`static void output_color(struct grep_opt opt, const void data, size_t size,`
			`const char *color)`
			`{`
color: delay auto-color decision until point of use When we read a color value either from a config file or from the command line, we use git_config_colorbool to convert it from the tristate always/never/auto into a single yes/no boolean value. This has some timing implications with respect to starting a pager. If we start (or decide not to start) the pager before checking the colorbool, everything is fine. Either isatty(1) will give us the right information, or we will properly check for pager_in_use(). However, if we decide to start a pager after we have checked the colorbool, things are not so simple. If stdout is a tty, then we will have already decided to use color. However, the user may also have configured color.pager not to use color with the pager. In this case, we need to actually turn off color. Unfortunately, the pager code has no idea which color variables were turned on (and there are many of them throughout the code, and they may even have been manipulated after the colorbool selection by something like "--color" on the command line). This bug can be seen any time a pager is started after config and command line options are checked. This has affected "git diff" since 89d07f7 (diff: don't run pager if user asked for a diff style exit code, 2007-08-12). It has also affect the log family since 1fda91b (Fix 'git log' early pager startup error case, 2010-08-24). This patch splits the notion of parsing a colorbool and actually checking the configuration. The "use_color" variables now have an additional possible value, GIT_COLOR_AUTO. Users of the variable should use the new "want_color()" wrapper, which will lazily determine and cache the auto-color decision. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-18 07:04:23 +02:00			`if (want_color(opt->color) && color && color[0]) {`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`opt->output(opt, color, strlen(color));`
			`opt->output(opt, data, size);`
			`opt->output(opt, GIT_COLOR_RESET, strlen(GIT_COLOR_RESET));`
			`} else`
			`opt->output(opt, data, size);`
			`}`

			`static void output_sep(struct grep_opt *opt, char sign)`
			`{`
			`if (opt->null_following_name)`
			`opt->output(opt, "\0", 1);`
			`else`
			`output_color(opt, &sign, 1, opt->color_sep);`
			`}`

git grep: Add "-z/--null" option as in GNU's grep. Here's a trivial patch that adds "-z" and "--null" options to "git grep". It was discussed on the mailing-list that git's "-z" convention should be used instead of GNU grep's "-Z". So things like 'git grep -l -z "$FOO" \| xargs -0 sed -i "s/$FOO/$BOO/"' do work now. Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 18:11:15 +02:00			`static void show_name(struct grep_opt opt, const char name)`
			`{`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`output_color(opt, name, strlen(name), opt->color_filename);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`opt->output(opt, opt->null_following_name ? "\0" : "\n", 1);`
git grep: Add "-z/--null" option as in GNU's grep. Here's a trivial patch that adds "-z" and "--null" options to "git grep". It was discussed on the mailing-list that git's "-z" convention should be used instead of GNU grep's "-Z". So things like 'git grep -l -z "$FOO" \| xargs -0 sed -i "s/$FOO/$BOO/"' do work now. Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 18:11:15 +02:00			`}`

grep: support NUL chars in search strings for -F Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:43:43 +02:00			`static int fixmatch(struct grep_pat p, char line, char *eol,`
			`regmatch_t *match)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`struct kwsmatch kwsm;`
			`size_t offset = kwsexec(p->kws, line, eol - line, &kwsm);`
			`if (offset == -1) {`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`match->rm_so = match->rm_eo = -1;`
			`return REG_NOMATCH;`
Use kwset in grep Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:42:18 +02:00			`} else {`
			`match->rm_so = offset;`
			`match->rm_eo = match->rm_so + kwsm.size[0];`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return 0;`
			`}`
			`}`

grep: use REG_STARTEND for all matching if available Refactor REG_STARTEND handling inlook_ahead() into a new helper, regmatch(), and use it for line matching, too. This allows regex matching beyond NUL characters if regexec() supports the flag. NUL characters themselves are not matched in any way, though. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:35:07 +02:00			`static int regmatch(const regex_t preg, char line, char *eol,`
			`regmatch_t *match, int eflags)`
			`{`
			`#ifdef REG_STARTEND`
			`match->rm_so = 0;`
			`match->rm_eo = eol - line;`
			`eflags \|= REG_STARTEND;`
			`#endif`
			`return regexec(preg, line, 1, match, eflags);`
			`}`

grep: Put calls to fixmatch() and regmatch() into patmatch() Both match_one_pattern() and look_ahead() use fixmatch() and regmatch() in the same way. They really want to match a pattern againt a string, but now they need to know if the pattern is fixed or regexp. This change cleans this up by introducing patmatch() (from "pattern match") and also simplifies inserting other ways of matching a string. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-05 00:00:19 +02:00			`static int patmatch(struct grep_pat p, char line, char *eol,`
			`regmatch_t *match, int eflags)`
			`{`
			`int hit;`

			`if (p->fixed)`
			`hit = !fixmatch(p, line, eol, match);`
git-grep: Learn PCRE This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:05 +02:00			`else if (p->pcre_regexp)`
			`hit = !pcrematch(p, line, eol, match, eflags);`
grep: Put calls to fixmatch() and regmatch() into patmatch() Both match_one_pattern() and look_ahead() use fixmatch() and regmatch() in the same way. They really want to match a pattern againt a string, but now they need to know if the pattern is fixed or regexp. This change cleans this up by introducing patmatch() (from "pattern match") and also simplifies inserting other ways of matching a string. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-05 00:00:19 +02:00			`else`
			`hit = !regmatch(&p->regexp, line, eol, match, eflags);`

			`return hit;`
			`}`

log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`static int strip_timestamp(char bol, char *eol_p)`
			`{`
			`char eol = eol_p;`
			`int ch;`

			`while (bol < --eol) {`
			`if (*eol != '>')`
			`continue;`
			`*eol_p = ++eol;`
			`ch = *eol;`
			`*eol = '\0';`
			`return ch;`
			`}`
			`return 0;`
			`}`

			`static struct {`
			`const char *field;`
			`size_t len;`
			`} header_field[] = {`
			`{ "author ", 7 },`
			`{ "committer ", 10 },`
revision: add --grep-reflog to filter commits by reflog messages Similar to --author/--committer which filters commits by author and committer header fields. --grep-reflog adds a fake "reflog" header to commit and a grep filter to search on that line. All rules to --author/--committer apply except no timestamp stripping. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-29 06:41:28 +02:00			`{ "reflog ", 7 },`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`};`

grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`static int match_one_pattern(struct grep_pat p, char bol, char *eol,`
grep: add pmatch and eflags arguments to match_one_pattern() Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:30:27 +01:00			`enum grep_context ctx,`
			`regmatch_t *pmatch, int eflags)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
			`int hit = 0;`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`int saved_ch = 0;`
grep: fix word-regexp colouring As noticed by Dmitry Gryazin: When a pattern is found but it doesn't start and end at word boundaries, bol is forwarded to after the match and the pattern is searched again. When a pattern is finally found between word boundaries, the match offsets are off by the number of characters that have been skipped. This patch corrects the offsets to be relative to the value of bol as passed to match_one_pattern() by its caller. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-20 23:31:53 +02:00			`const char *start = bol;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`if ((p->token != GREP_PATTERN) &&`
			`((p->token == GREP_PATTERN_HEAD) != (ctx == GREP_CONTEXT_HEAD)))`
			`return 0;`

log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`if (p->token == GREP_PATTERN_HEAD) {`
			`const char *field;`
			`size_t len;`
			`assert(p->field < ARRAY_SIZE(header_field));`
			`field = header_field[p->field].field;`
			`len = header_field[p->field].len;`
			`if (strncmp(bol, field, len))`
			`return 0;`
			`bol += len;`
grep: prepare for new header field filter grep supports only author and committer headers, which have the same special treatment that later headers may or may not have. Check for field type and only strip_timestamp() when the field is either author or committer. GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be calculated automatically, correctly, as long as it's at the end of the enum. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-29 06:41:27 +02:00			`switch (p->field) {`
			`case GREP_HEADER_AUTHOR:`
			`case GREP_HEADER_COMMITTER:`
			`saved_ch = strip_timestamp(bol, &eol);`
			`break;`
			`default:`
			`break;`
			`}`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`}`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`again:`
grep: Put calls to fixmatch() and regmatch() into patmatch() Both match_one_pattern() and look_ahead() use fixmatch() and regmatch() in the same way. They really want to match a pattern againt a string, but now they need to know if the pattern is fixed or regexp. This change cleans this up by introducing patmatch() (from "pattern match") and also simplifies inserting other ways of matching a string. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-05 00:00:19 +02:00			`hit = patmatch(p, bol, eol, pmatch, eflags);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`if (hit && p->word_regexp) {`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if ((pmatch[0].rm_so < 0) \|\|`
grep: fix empty word-regexp matches The command "git grep -w ''" dies as soon as it encounters an empty line, reporting (wrongly) that "regexp returned nonsense". The first hunk of this patch relaxes the sanity check that is responsible for that, allowing matches to start at the end. The second hunk complements it by making sure that empty matches are rejected if -w was specified, as they are not really words. GNU grep does the same: $ echo foo \| grep -c '' 1 $ echo foo \| grep -c -w '' 0 Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-03 18:19:01 +02:00			`(eol - bol) < pmatch[0].rm_so \|\|`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`(pmatch[0].rm_eo < 0) \|\|`
			`(eol - bol) < pmatch[0].rm_eo)`
			`die("regexp returned nonsense");`

			`/* Match beginning must be either beginning of the`
			`* line, or at word boundary (i.e. the last char must`
			`* not be a word char). Similarly, match end must be`
			`* either end of the line, or at word boundary`
			`* (i.e. the next char must not be a word char).`
			`*/`
grep -w: forward to next possible position after rejected match grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:08:40 +01:00			`if ( ((pmatch[0].rm_so == 0) \|\|`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`!word_char(bol[pmatch[0].rm_so-1])) &&`
			`((pmatch[0].rm_eo == (eol-bol)) \|\|`
			`!word_char(bol[pmatch[0].rm_eo])) )`
			`;`
			`else`
			`hit = 0;`

grep: fix empty word-regexp matches The command "git grep -w ''" dies as soon as it encounters an empty line, reporting (wrongly) that "regexp returned nonsense". The first hunk of this patch relaxes the sanity check that is responsible for that, allowing matches to start at the end. The second hunk complements it by making sure that empty matches are rejected if -w was specified, as they are not really words. GNU grep does the same: $ echo foo \| grep -c '' 1 $ echo foo \| grep -c -w '' 0 Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-03 18:19:01 +02:00			`/* Words consist of at least one character. */`
			`if (pmatch->rm_so == pmatch->rm_eo)`
			`hit = 0;`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (!hit && pmatch[0].rm_so + bol + 1 < eol) {`
			`/* There could be more than one match on the`
			`* line, and the first match might not be`
			`* strict word match. But later ones could be!`
grep -w: forward to next possible position after rejected match grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:08:40 +01:00			`* Forward to the next possible start, i.e. the`
			`* next position following a non-word char.`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`*/`
			`bol = pmatch[0].rm_so + bol + 1;`
grep -w: forward to next possible position after rejected match grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:08:40 +01:00			`while (word_char(bol[-1]) && bol < eol)`
			`bol++;`
grep: fix word-regexp at the beginning of lines After bol is forwarded, it doesn't represent the beginning of the line any more. This means that the beginning-of-line marker (^) mustn't match, i.e. the regex flag REG_NOTBOL needs to be set. This bug was introduced by fb62eb7fab97cea880ea7fe4f341a4dfad14ab48 ("grep -w: forward to next possible position after rejected match"). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-23 13:45:26 +02:00			`eflags \|= REG_NOTBOL;`
grep -w: forward to next possible position after rejected match grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-10 00:08:40 +01:00			`if (bol < eol)`
			`goto again;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`
			`}`
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .x" would never match anything; To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-05 07:15:02 +02:00			`if (p->token == GREP_PATTERN_HEAD && saved_ch)`
			`*eol = saved_ch;`
grep: fix word-regexp colouring As noticed by Dmitry Gryazin: When a pattern is found but it doesn't start and end at word boundaries, bol is forwarded to after the match and the pattern is searched again. When a pattern is finally found between word boundaries, the match offsets are off by the number of characters that have been skipped. This patch corrects the offsets to be relative to the value of bol as passed to match_one_pattern() by its caller. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-20 23:31:53 +02:00			`if (hit) {`
			`pmatch[0].rm_so += bol - start;`
			`pmatch[0].rm_eo += bol - start;`
			`}`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return hit;`
			`}`

grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`static int match_expr_eval(struct grep_expr x, char bol, char *eol,`
			`enum grep_context ctx, int collect_hits)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`int h = 0;`
grep: add pmatch and eflags arguments to match_one_pattern() Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:30:27 +01:00			`regmatch_t match;`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00
grep: fix segfault when "git grep '('" is given Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-27 20:10:24 +02:00			`if (!x)`
			`die("Not a valid grep expression");`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`switch (x->node) {`
log --author: take union of multiple "author" requests In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-13 07:15:35 +02:00			`case GREP_NODE_TRUE:`
			`h = 1;`
			`break;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`case GREP_NODE_ATOM:`
grep: add pmatch and eflags arguments to match_one_pattern() Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:30:27 +01:00			`h = match_one_pattern(x->u.atom, bol, eol, ctx, &match, 0);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`break;`
			`case GREP_NODE_NOT:`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`h = !match_expr_eval(x->u.unary, bol, eol, ctx, 0);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`break;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`case GREP_NODE_AND:`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, 0))`
grep: micro-optimize hit collection for AND nodes In addition to returning if an expression matches a line, match_expr_eval() updates the expression's hit flag if the parameter collect_hits is set. It never sets collect_hits for children of AND nodes, though, so their hit flag will never be updated. Because of that we can return early if the first child didn't match, no matter if collect_hits is set or not. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:27:15 +01:00			`return 0;`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`h = match_expr_eval(x->u.binary.right, bol, eol, ctx, 0);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`break;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`case GREP_NODE_OR:`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`if (!collect_hits)`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`return (match_expr_eval(x->u.binary.left,`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`bol, eol, ctx, 0) \|\|`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`match_expr_eval(x->u.binary.right,`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`bol, eol, ctx, 0));`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`h = match_expr_eval(x->u.binary.left, bol, eol, ctx, 0);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`x->u.binary.left->hit \|= h;`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`h \|= match_expr_eval(x->u.binary.right, bol, eol, ctx, 1);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`break;`
			`default:`
remove trailing LF in die() messages LF at the end of format strings given to die() is redundant because die already adds one on its own. Signed-off-by: Alexander Potashev <aspotashev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-04 19:38:41 +01:00			`die("Unexpected node type (internal error) %d", x->node);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`if (collect_hits)`
			`x->hit \|= h;`
			`return h;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`static int match_expr(struct grep_opt opt, char bol, char *eol,`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`enum grep_context ctx, int collect_hits)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
			`struct grep_expr *x = opt->pattern_expression;`
grep: remove grep_opt argument from match_expr_eval() The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:28:40 +01:00			`return match_expr_eval(x, bol, eol, ctx, collect_hits);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`static int match_line(struct grep_opt opt, char bol, char *eol,`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`enum grep_context ctx, int collect_hits)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
			`struct grep_pat *p;`
grep: add pmatch and eflags arguments to match_one_pattern() Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:30:27 +01:00			`regmatch_t match;`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (opt->extended)`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`return match_expr(opt, bol, eol, ctx, collect_hits);`

			`/* we do not call with collect_hits without being extended */`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`for (p = opt->pattern_list; p; p = p->next) {`
grep: add pmatch and eflags arguments to match_one_pattern() Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:30:27 +01:00			`if (match_one_pattern(p, bol, eol, ctx, &match, 0))`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return 1;`
			`}`
			`return 0;`
			`}`

grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00			`static int match_next_pattern(struct grep_pat p, char bol, char *eol,`
			`enum grep_context ctx,`
			`regmatch_t *pmatch, int eflags)`
			`{`
			`regmatch_t match;`

			`if (!match_one_pattern(p, bol, eol, ctx, &match, eflags))`
			`return 0;`
			`if (match.rm_so < 0 \|\| match.rm_eo < 0)`
			`return 0;`
			`if (pmatch->rm_so >= 0 && pmatch->rm_eo >= 0) {`
			`if (match.rm_so > pmatch->rm_so)`
			`return 1;`
			`if (match.rm_so == pmatch->rm_so && match.rm_eo < pmatch->rm_eo)`
			`return 1;`
			`}`
			`pmatch->rm_so = match.rm_so;`
			`pmatch->rm_eo = match.rm_eo;`
			`return 1;`
			`}`

			`static int next_match(struct grep_opt opt, char bol, char *eol,`
			`enum grep_context ctx, regmatch_t *pmatch, int eflags)`
			`{`
			`struct grep_pat *p;`
			`int hit = 0;`

			`pmatch->rm_so = pmatch->rm_eo = -1;`
			`if (bol < eol) {`
			`for (p = opt->pattern_list; p; p = p->next) {`
			`switch (p->token) {`
			`case GREP_PATTERN: /* atom */`
			`case GREP_PATTERN_HEAD:`
			`case GREP_PATTERN_BODY:`
			`hit \|= match_next_pattern(p, bol, eol, ctx,`
			`pmatch, eflags);`
			`break;`
			`default:`
			`break;`
			`}`
			`}`
			`}`
			`return hit;`
			`}`

			`static void show_line(struct grep_opt opt, char bol, char *eol,`
			`const char *name, unsigned lno, char sign)`
			`{`
			`int rest = eol - bol;`
grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`const char match_color, line_color = NULL;`
grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00
grep: add --break With --break, an empty line is printed between matches from different files, increasing readability. This option is taken from ack (http://betterthangrep.com/). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-05 17:24:25 +02:00			`if (opt->file_break && opt->last_shown == 0) {`
			`if (opt->show_hunk_mark)`
			`opt->output(opt, "\n", 1);`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`} else if (opt->pre_context \|\| opt->post_context \|\| opt->funcbody) {`
grep: print context hunk marks between files Print a hunk mark before matches from a new file are shown, in addition to the current behaviour of printing them if lines have been skipped. The result is easier to read, as (presumably unrelated) matches from different files are separated by a hunk mark. GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:03:44 +02:00			`if (opt->last_shown == 0) {`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`if (opt->show_hunk_mark) {`
			`output_color(opt, "--", 2, opt->color_sep);`
			`opt->output(opt, "\n", 1);`
Merge branch 'rs/threaded-grep-context' * rs/threaded-grep-context: grep: enable threading for context line printing Conflicts: grep.c 2010-04-03 21:28:39 +02:00			`}`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`} else if (lno > opt->last_shown + 1) {`
			`output_color(opt, "--", 2, opt->color_sep);`
			`opt->output(opt, "\n", 1);`
			`}`
grep: move context hunk mark handling into show_line() Move last_shown into struct grep_opt, to make it available in show_line(), and then make the function handle the printing of hunk marks for context lines in a central place. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:02:38 +02:00			`}`
grep: add --heading With --heading, the filename is printed once before matches from that file instead of at the start of each line, giving more screen space to the actual search results. This option is taken from ack (http://betterthangrep.com/). And now git grep can dress up like it: $ git config alias.ack "grep --break --heading --line-number" $ git ack -e --heading Documentation/git-grep.txt 154:--heading:: t/t7810-grep.sh 785:test_expect_success 'grep --heading' ' 786: git grep --heading -e char -e lo_w hello.c hello_world >actual && 808: git grep --break --heading -n --color \ Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-05 17:24:36 +02:00			`if (opt->heading && opt->last_shown == 0) {`
			`output_color(opt, name, strlen(name), opt->color_filename);`
			`opt->output(opt, "\n", 1);`
			`}`
grep: move context hunk mark handling into show_line() Move last_shown into struct grep_opt, to make it available in show_line(), and then make the function handle the printing of hunk marks for context lines in a central place. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:02:38 +02:00			`opt->last_shown = lno;`

grep: add --heading With --heading, the filename is printed once before matches from that file instead of at the start of each line, giving more screen space to the actual search results. This option is taken from ack (http://betterthangrep.com/). And now git grep can dress up like it: $ git config alias.ack "grep --break --heading --line-number" $ git ack -e --heading Documentation/git-grep.txt 154:--heading:: t/t7810-grep.sh 785:test_expect_success 'grep --heading' ' 786: git grep --heading -e char -e lo_w hello.c hello_world >actual && 808: git grep --break --heading -n --color \ Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-05 17:24:36 +02:00			`if (!opt->heading && opt->pathname) {`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`output_color(opt, name, strlen(name), opt->color_filename);`
			`output_sep(opt, sign);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`}`
			`if (opt->linenum) {`
			`char buf[32];`
			`snprintf(buf, sizeof(buf), "%d", lno);`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`output_color(opt, buf, strlen(buf), opt->color_lineno);`
			`output_sep(opt, sign);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`}`
grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00			`if (opt->color) {`
			`regmatch_t match;`
			`enum grep_context ctx = GREP_CONTEXT_BODY;`
			`int ch = *eol;`
			`int eflags = 0;`

grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`if (sign == ':')`
			`match_color = opt->color_match_selected;`
			`else`
			`match_color = opt->color_match_context;`
grep: Colorize selected, context, and function lines Colorize non-matching text of selected lines, context lines, and function name lines. The default for all three is no color, but they can be configured using color.grep.<slot>. The first two are similar to the corresponding options in GNU grep, except that GNU grep applies the color to the entire line, not just non-matching text. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:47 +01:00			`if (sign == ':')`
			`line_color = opt->color_selected;`
			`else if (sign == '-')`
			`line_color = opt->color_context;`
			`else if (sign == '=')`
			`line_color = opt->color_function;`
grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00			`*eol = '\0';`
			`while (next_match(opt, bol, eol, ctx, &match, eflags)) {`
grep: fix colouring of matches with zero length If a zero-length match is encountered, break out of loop and show the rest of the line uncoloured. Otherwise we'd be looping forever, trying to make progress by advancing the pointer by zero characters. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-01 23:53:05 +02:00			`if (match.rm_so == match.rm_eo)`
			`break;`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00
grep: Colorize selected, context, and function lines Colorize non-matching text of selected lines, context lines, and function name lines. The default for all three is no color, but they can be configured using color.grep.<slot>. The first two are similar to the corresponding options in GNU grep, except that GNU grep applies the color to the entire line, not just non-matching text. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:47 +01:00			`output_color(opt, bol, match.rm_so, line_color);`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`output_color(opt, bol + match.rm_so,`
grep: add color.grep.matchcontext and color.grep.matchselected The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-10-27 19:23:05 +01:00			`match.rm_eo - match.rm_so, match_color);`
grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00			`bol += match.rm_eo;`
			`rest -= match.rm_eo;`
			`eflags = REG_NOTBOL;`
			`}`
			`*eol = ch;`
			`}`
grep: Colorize selected, context, and function lines Colorize non-matching text of selected lines, context lines, and function name lines. The default for all three is no color, but they can be configured using color.grep.<slot>. The first two are similar to the corresponding options in GNU grep, except that GNU grep applies the color to the entire line, not just non-matching text. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:47 +01:00			`output_color(opt, bol, rest, line_color);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`opt->output(opt, "\n", 1);`
grep: color patterns in output Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-07 13:32:32 +01:00			`}`

grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`#ifndef NO_PTHREADS`
grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`int grep_use_locks;`

grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`/*`
			`* This lock protects access to the gitattributes machinery, which is`
			`* not thread-safe.`
			`*/`
			`pthread_mutex_t grep_attr_mutex;`

grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`static inline void grep_attr_lock(void)`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`{`
grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`if (grep_use_locks)`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`pthread_mutex_lock(&grep_attr_mutex);`
			`}`

grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`static inline void grep_attr_unlock(void)`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`{`
grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`if (grep_use_locks)`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`pthread_mutex_unlock(&grep_attr_mutex);`
			`}`
grep: move sha1-reading mutex into low-level code The multi-threaded git-grep code needs to serialize access to the thread-unsafe read_sha1_file call. It does this with a mutex that is local to builtin/grep.c. Let's instead push this down into grep.c, where it can be used by both builtin/grep.c and grep.c. This will let us safely teach the low-level grep.c code tricks that involve reading from the object db. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:41 +01:00
			`/*`
			`* Same as git_attr_mutex, but protecting the thread-unsafe object db access.`
			`*/`
			`pthread_mutex_t grep_read_mutex;`

grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`#else`
grep: make locking flag global The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:18:29 +01:00			`#define grep_attr_lock()`
			`#define grep_attr_unlock()`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`#endif`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`static int match_funcname(struct grep_opt opt, struct grep_source gs, char bol, char eol)`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`{`
grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`xdemitconf_t *xecfg = opt->priv;`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`if (xecfg && !xecfg->find_func) {`
grep: cache userdiff_driver in grep_source Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:20:43 +01:00			`grep_source_load_driver(gs);`
			`if (gs->driver->funcname.pattern) {`
			`const struct userdiff_funcname *pe = &gs->driver->funcname;`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);`
			`} else {`
			`xecfg = opt->priv = NULL;`
			`}`
			`}`

			`if (xecfg) {`
grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`char buf[1];`
			`return xecfg->find_func(bol, eol - bol, buf, 1,`
			`xecfg->find_func_priv) >= 0;`
			`}`

grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`if (bol == eol)`
			`return 0;`
			`if (isalpha(bol) \|\| bol == '_' \|\| *bol == '$')`
			`return 1;`
			`return 0;`
			`}`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`static void show_funcname_line(struct grep_opt opt, struct grep_source gs,`
			`char *bol, unsigned lno)`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`{`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`while (bol > gs->buf) {`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`char *eol = --bol;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`while (bol > gs->buf && bol[-1] != '\n')`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`bol--;`
			`lno--;`

			`if (lno <= opt->last_shown)`
			`break;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`if (match_funcname(opt, gs, bol, eol)) {`
			`show_line(opt, bol, eol, gs->name, lno, '=');`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`break;`
			`}`
			`}`
			`}`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`static void show_pre_context(struct grep_opt opt, struct grep_source gs,`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`char bol, char end, unsigned lno)`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00			`{`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`unsigned cur = lno, from = 1, funcname_lno = 0;`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`int funcname_needed = !!opt->funcname;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`if (opt->funcbody && !match_funcname(opt, gs, bol, end))`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`funcname_needed = 2;`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00
			`if (opt->pre_context < lno)`
			`from = lno - opt->pre_context;`
			`if (from <= opt->last_shown)`
			`from = opt->last_shown + 1;`

			`/* Rewind. */`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`while (bol > gs->buf &&`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`cur > (funcname_needed == 2 ? opt->last_shown + 1 : from)) {`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`char *eol = --bol;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`while (bol > gs->buf && bol[-1] != '\n')`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00			`bol--;`
			`cur--;`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`if (funcname_needed && match_funcname(opt, gs, bol, eol)) {`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`funcname_lno = cur;`
			`funcname_needed = 0;`
			`}`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00			`}`

grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`/* We need to look even further back to find a function signature. */`
			`if (opt->funcname && funcname_needed)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_funcname_line(opt, gs, bol, cur);`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00			`/* Back forward. */`
			`while (cur < lno) {`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`char *eol = bol, sign = (cur == funcname_lno) ? '=' : '-';`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00
			`while (*eol != '\n')`
			`eol++;`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_line(opt, bol, eol, gs->name, cur, sign);`
grep: handle pre context lines on demand Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:05:17 +02:00			`bol = eol + 1;`
			`cur++;`
			`}`
			`}`

grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`static int should_lookahead(struct grep_opt *opt)`
			`{`
			`struct grep_pat *p;`

			`if (opt->extended)`
			`return 0; /* punt for too complex stuff */`
			`if (opt->invert)`
			`return 0;`
			`for (p = opt->pattern_list; p; p = p->next) {`
			`if (p->token != GREP_PATTERN)`
			`return 0; /* punt for "header only" and stuff */`
			`}`
			`return 1;`
			`}`

			`static int look_ahead(struct grep_opt *opt,`
			`unsigned long *left_p,`
			`unsigned *lno_p,`
			`char **bol_p)`
			`{`
			`unsigned lno = *lno_p;`
			`char bol = bol_p;`
			`struct grep_pat *p;`
			`char sp, last_bol;`
			`regoff_t earliest = -1;`

			`for (p = opt->pattern_list; p; p = p->next) {`
			`int hit;`
			`regmatch_t m;`

grep: Put calls to fixmatch() and regmatch() into patmatch() Both match_one_pattern() and look_ahead() use fixmatch() and regmatch() in the same way. They really want to match a pattern againt a string, but now they need to know if the pattern is fixed or regexp. This change cleans this up by introducing patmatch() (from "pattern match") and also simplifies inserting other ways of matching a string. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-05 00:00:19 +02:00			`hit = patmatch(p, bol, bol + *left_p, &m, 0);`
grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`if (!hit \|\| m.rm_so < 0 \|\| m.rm_eo < 0)`
			`continue;`
			`if (earliest < 0 \|\| m.rm_so < earliest)`
			`earliest = m.rm_so;`
			`}`

			`if (earliest < 0) {`
			`bol_p = bol + left_p;`
			`*left_p = 0;`
			`return 1;`
			`}`
			`for (sp = bol + earliest; bol < sp && sp[-1] != '\n'; sp--)`
			`; /* find the beginning of the line */`
			`last_bol = sp;`

			`for (sp = bol; sp < last_bol; sp++) {`
			`if (*sp == '\n')`
			`lno++;`
			`}`
			`*left_p -= last_bol - bol;`
			`*bol_p = last_bol;`
			`*lno_p = lno;`
			`return 0;`
			`}`

Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`static void std_output(struct grep_opt opt, const void buf, size_t size)`
			`{`
			`fwrite(buf, size, 1, stdout);`
			`}`

grep: allow to use textconv filters Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-10 17:10:15 +02:00			`static int fill_textconv_grep(struct userdiff_driver *driver,`
			`struct grep_source *gs)`
			`{`
			`struct diff_filespec *df;`
			`char *buf;`
			`size_t size;`

			`if (!driver \|\| !driver->textconv)`
			`return grep_source_load(gs);`

			`/*`
			`* The textconv interface is intimately tied to diff_filespecs, so we`
			`* have to pretend to be one. If we could unify the grep_source`
			`* and diff_filespec structs, this mess could just go away.`
			`*/`
			`df = alloc_filespec(gs->path);`
			`switch (gs->type) {`
			`case GREP_SOURCE_SHA1:`
			`fill_filespec(df, gs->identifier, 1, 0100644);`
			`break;`
			`case GREP_SOURCE_FILE:`
			`fill_filespec(df, null_sha1, 0, 0100644);`
			`break;`
			`default:`
			`die("BUG: attempt to textconv something without a path?");`
			`}`

			`/*`
			`* fill_textconv is not remotely thread-safe; it may load objects`
			`* behind the scenes, and it modifies the global diff tempfile`
			`* structure.`
			`*/`
			`grep_read_lock();`
			`size = fill_textconv(driver, df, &buf);`
			`grep_read_unlock();`
			`free_filespec(df);`

			`/*`
			`* The normal fill_textconv usage by the diff machinery would just keep`
			`* the textconv'd buf separate from the diff_filespec. But much of the`
			`* grep code passes around a grep_source and assumes that its "buf"`
			`* pointer is the beginning of the thing we are searching. So let's`
			`* install our textconv'd version into the grep_source, taking care not`
			`* to leak any existing buffer.`
			`*/`
			`grep_source_clear_data(gs);`
			`gs->buf = buf;`
			`gs->size = size;`

			`return 0;`
			`}`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`static int grep_source_1(struct grep_opt opt, struct grep_source gs, int collect_hits)`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`{`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`char *bol;`
			`unsigned long left;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`unsigned lno = 1;`
			`unsigned last_hit = 0;`
			`int binary_match_only = 0;`
			`unsigned count = 0;`
grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`int try_lookahead = 0;`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`int show_function = 0;`
grep: allow to use textconv filters Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-10 17:10:15 +02:00			`struct userdiff_driver *textconv = NULL;`
Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`enum grep_context ctx = GREP_CONTEXT_HEAD;`
grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`xdemitconf_t xecfg;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`if (!opt->output)`
			`opt->output = std_output;`

grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`if (opt->pre_context \|\| opt->post_context \|\| opt->file_break \|\|`
			`opt->funcbody) {`
grep: fix coloring of hunk marks between files Commit 431d6e7b (grep: enable threading for context line printing) split the printing of the "--\n" mark between results from different files out into two places: show_line() in grep.c for the non-threaded case and work_done() in builtin/grep.c for the threaded case. Commit 55f638bd (grep: Colorize filename, line number, and separator) updated the former, but not the latter, so the separators between files are not colored if threads are used. This patch merges the two. In the threaded case, hunk marks are now printed by show_line() for every file, including the first one, and the very first mark is simply skipped in work_done(). This ensures that the output is properly colored and works just as well. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-06-05 17:24:15 +02:00			`/* Show hunk marks, except for the first file. */`
			`if (opt->last_shown)`
			`opt->show_hunk_mark = 1;`
			`/*`
			`* If we're using threads then we can't easily identify`
			`* the first file. Always put hunk marks in that case`
			`* and skip the very first one later in work_done().`
			`*/`
			`if (opt->output != std_output)`
			`opt->show_hunk_mark = 1;`
			`}`
grep: enable threading for context line printing If context lines are to be printed, grep separates them with hunk marks ("--\n"). These marks are printed between matches from different files, too. They are not printed before the first file, though. Threading was disabled when context line printing was enabled because avoiding to print the mark before the first line was an unsolved synchronisation problem. This patch separates the code for printing hunk marks for the threaded and the unthreaded case, allowing threading to be turned on together with the common -ABC options. ->show_hunk_mark, which controls printing of hunk marks between files in show_line(), is now set in grep_buffer_1(), but only if some results have already been printed and threading is disabled. The threaded case is handled in work_done(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-15 17:21:10 +01:00			`opt->last_shown = 0;`

grep: allow to use textconv filters Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-10 17:10:15 +02:00			`if (opt->allow_textconv) {`
			`grep_source_load_driver(gs);`
			`/*`
			`* We might set up the shared textconv cache data here, which`
			`* is not thread-safe.`
			`*/`
			`grep_attr_lock();`
			`textconv = userdiff_get_textconv(gs->driver);`
			`grep_attr_unlock();`
			`}`

			`/*`
			`* We know the result of a textconv is text, so we only have to care`
			`* about binary handling if we are not using it.`
			`*/`
			`if (!textconv) {`
			`switch (opt->binary) {`
			`case GREP_BINARY_DEFAULT:`
			`if (grep_source_is_binary(gs))`
			`binary_match_only = 1;`
			`break;`
			`case GREP_BINARY_NOMATCH:`
			`if (grep_source_is_binary(gs))`
			`return 0; /* Assume unmatch */`
			`break;`
			`case GREP_BINARY_TEXT:`
			`break;`
			`default:`
			`die("bug: unknown binary handling mode");`
			`}`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`memset(&xecfg, 0, sizeof(xecfg));`
grep: enable threading with -p and -W using lazy attribute lookup Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-12 22:16:07 +01:00			`opt->priv = &xecfg;`

grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`try_lookahead = should_lookahead(opt);`
grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00
grep: allow to use textconv filters Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-10 17:10:15 +02:00			`if (fill_textconv_grep(textconv, gs) < 0)`
grep: load file data after checking binary-ness Usually we load each file to grep into memory, check whether it's binary, and then either grep it (the default) or not (if "-I" was given). In the "-I" case, we can skip loading the file entirely if it is marked as binary via gitattributes. On my giant 3-gigabyte media repository, doing "git grep -I foo" went from: real 0m0.712s user 0m0.044s sys 0m4.780s to: real 0m0.026s user 0m0.016s sys 0m0.020s Obviously this is an extreme example. The repo is almost entirely binary files, and you can see that we spent all of our time asking the kernel to read() the data. However, with a cold disk cache, even avoiding a few binary files can have an impact. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:21:11 +01:00			`return 0;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`bol = gs->buf;`
			`left = gs->size;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`while (left) {`
			`char *eol, ch;`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`int hit;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00
grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`/*`
grep: Fix a typo in a comment Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 23:52:03 +02:00			`* look_ahead() skips quickly to the line that possibly`
grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`* has the next hit; don't call it if we need to do`
			`* something more than just skipping the current line`
			`* in response to an unmatch for the current line. E.g.`
			`* inside a post-context window, we will show the current`
			`* line as a context around the previous hit when it`
			`* doesn't hit.`
			`*/`
			`if (try_lookahead`
			`&& !(last_hit`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`&& (show_function \|\|`
			`lno <= last_hit + opt->post_context))`
grep: optimize built-in grep by skipping lines that do not hit The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-11 07:39:36 +01:00			`&& look_ahead(opt, &left, &lno, &bol))`
			`break;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`eol = end_of_line(bol, &left);`
			`ch = *eol;`
			`*eol = 0;`

Update grep internal for grepping only in head/body This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-20 21:39:46 +02:00			`if ((ctx == GREP_CONTEXT_HEAD) && (eol == bol))`
			`ctx = GREP_CONTEXT_BODY;`

grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`hit = match_line(opt, bol, eol, ctx, collect_hits);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`*eol = ch;`

grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`if (collect_hits)`
			`goto next_line;`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`/* "grep -v -e foo -e bla" should list lines`
			`* that do not have either, so inversion should`
			`* be done outside.`
			`*/`
			`if (opt->invert)`
			`hit = !hit;`
			`if (opt->unmatch_name_only) {`
			`if (hit)`
			`return 0;`
			`goto next_line;`
			`}`
			`if (hit) {`
			`count++;`
			`if (opt->status_only)`
			`return 1;`
grep: --name-only over binary As with the option -c/--count, git grep with the option -l/--name-only should work the same with binary files as with text files because there is no danger of messing up the terminal with control characters from the contents of matching files. GNU grep does the same. Move the check for ->name_only before the one for binary_match_only, thus making the latter irrelevant for git grep -l. Reported-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:30:48 +02:00			`if (opt->name_only) {`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_name(opt, gs->name);`
grep: --name-only over binary As with the option -c/--count, git grep with the option -l/--name-only should work the same with binary files as with text files because there is no danger of messing up the terminal with control characters from the contents of matching files. GNU grep does the same. Move the check for ->name_only before the one for binary_match_only, thus making the latter irrelevant for git grep -l. Reported-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:30:48 +02:00			`return 1;`
			`}`
grep: --count over binary The intent of showing the message "Binary file xyz matches" for binary files is to avoid annoying users by potentially messing up their terminals by printing control characters. In --count mode, this precaution isn't necessary. Display counts of matches if -c/--count was specified, even if -a was not given. GNU grep does the same. Moving the check for ->count before the code for handling binary file also avoids printing context lines if --count and -[ABC] were used together, so we can remove the part of the comment that mentions this behaviour. Again, GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:29:35 +02:00			`if (opt->count)`
			`goto next_line;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (binary_match_only) {`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`opt->output(opt, "Binary file ", 12);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`output_color(opt, gs->name, strlen(gs->name),`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`opt->color_filename);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`opt->output(opt, " matches\n", 9);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return 1;`
			`}`
			`/* Hit at this line. If we haven't shown the`
			`* pre-context lines, we would need to show them.`
			`*/`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`if (opt->pre_context \|\| opt->funcbody)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_pre_context(opt, gs, bol, eol, lno);`
grep: add option -p/--show-function The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:06:34 +02:00			`else if (opt->funcname)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_funcname_line(opt, gs, bol, lno);`
			`show_line(opt, bol, eol, gs->name, lno, ':');`
grep: move context hunk mark handling into show_line() Move last_shown into struct grep_opt, to make it available in show_line(), and then make the function handle the printing of hunk marks for context lines in a central place. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:02:38 +02:00			`last_hit = lno;`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`if (opt->funcbody)`
			`show_function = 1;`
			`goto next_line;`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`if (show_function && match_funcname(opt, gs, bol, eol))`
grep: add option to show whole function as context Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-01 19:20:53 +02:00			`show_function = 0;`
			`if (show_function \|\|`
			`(last_hit && lno <= last_hit + opt->post_context)) {`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`/* If the last hit is within the post context,`
			`* we need to show this line.`
			`*/`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_line(opt, bol, eol, gs->name, lno, '-');`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`}`

			`next_line:`
			`bol = eol + 1;`
			`if (!left)`
			`break;`
			`left--;`
			`lno++;`
			`}`

grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`if (collect_hits)`
			`return 0;`
grep: free expressions and patterns when done. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 01:27:10 +02:00
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`if (opt->status_only)`
			`return 0;`
			`if (opt->unmatch_name_only) {`
			`/* We did not see any hit, so we want to show this */`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`show_name(opt, gs->name);`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return 1;`
			`}`

grep -p: support user defined regular expressions Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-02 00:07:24 +02:00			`xdiff_clear_find_func(&xecfg);`
			`opt->priv = NULL;`

builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`/* NEEDSWORK:`
			`* The real "grep -c foo *.c" gives many "bar.c:0" lines,`
			`* which feels mostly useless but sometimes useful. Maybe`
			`* make it another option? For now suppress them.`
			`*/`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`if (opt->count && count) {`
			`char buf[32];`
grep: support -h (no header) with --count Suppress printing the header (filename) with -h even if in -c/--count mode. GNU grep and OpenBSD's grep do the same. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-11 22:15:49 +01:00			`if (opt->pathname) {`
			`output_color(opt, gs->name, strlen(gs->name),`
			`opt->color_filename);`
			`output_sep(opt, ':');`
			`}`
grep: Colorize filename, line number, and separator Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-03-07 17:52:46 +01:00			`snprintf(buf, sizeof(buf), "%u\n", count);`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`opt->output(opt, buf, strlen(buf));`
grep: --count over binary The intent of showing the message "Binary file xyz matches" for binary files is to avoid annoying users by potentially messing up their terminals by printing control characters. In --count mode, this precaution isn't necessary. Display counts of matches if -c/--count was specified, even if -a was not given. GNU grep does the same. Moving the check for ->count before the code for handling binary file also avoids printing context lines if --count and -[ABC] were used together, so we can remove the part of the comment that mentions this behaviour. Again, GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-22 23:29:35 +02:00			`return 1;`
Threaded grep Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-25 23:51:39 +01:00			`}`
builtin-grep: make pieces of it available as library. This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-18 01:02:52 +02:00			`return !!last_hit;`
			`}`

grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`static void clr_hit_marker(struct grep_expr *x)`
			`{`
			`/* All-hit markers are meaningful only at the very top level`
			`* OR node.`
			`*/`
			`while (1) {`
			`x->hit = 0;`
			`if (x->node != GREP_NODE_OR)`
			`return;`
			`x->u.binary.left->hit = 0;`
			`x = x->u.binary.right;`
			`}`
			`}`

			`static int chk_hit_marker(struct grep_expr *x)`
			`{`
			`/* Top level nodes have hit markers. See if they all are hits */`
			`while (1) {`
			`if (x->node != GREP_NODE_OR)`
			`return x->hit;`
			`if (!x->u.binary.left->hit)`
			`return 0;`
			`x = x->u.binary.right;`
			`}`
			`}`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`int grep_source(struct grep_opt opt, struct grep_source gs)`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`{`
			`/*`
			`* we do not have to do the two-pass grep when we do not check`
			`* buffer-wide "all-match".`
			`*/`
			`if (!opt->all_match)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`return grep_source_1(opt, gs, 0);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00
			`/* Otherwise the toplevel "or" terms hit a bit differently.`
			`* We first clear hit markers from them.`
			`*/`
			`clr_hit_marker(opt->pattern_expression);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`grep_source_1(opt, gs, 1);`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00
			`if (!chk_hit_marker(opt->pattern_expression))`
			`return 0;`

grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`return grep_source_1(opt, gs, 0);`
			`}`

grep: drop grep_buffer's "name" parameter Before the grep_source interface existed, grep_buffer was used by two types of callers: 1. Ones which pulled a file into a buffer, and then wanted to supply the file's name for the output (i.e., git grep). 2. Ones which really just wanted to grep a buffer (i.e., git log --grep). Callers in set (1) should now be using grep_source. Callers in set (2) always pass NULL for the "name" parameter of grep_buffer. We can therefore get rid of this now-useless parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:20:10 +01:00			`int grep_buffer(struct grep_opt opt, char buf, unsigned long size)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`{`
			`struct grep_source gs;`
			`int r;`

grep: stop looking at random places for .gitattributes grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-12 12:49:38 +02:00			`grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`gs.buf = buf;`
			`gs.size = size;`

			`r = grep_source(opt, &gs);`

			`grep_source_clear(&gs);`
			`return r;`
			`}`

			`void grep_source_init(struct grep_source *gs, enum grep_source_type type,`
grep: stop looking at random places for .gitattributes grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-12 12:49:38 +02:00			`const char name, const char path,`
			`const void *identifier)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`{`
			`gs->type = type;`
use xstrdup_or_null to replace ternary conditionals This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x). The change is fairly mechanical, with the exception of resolve_refdup, which can eliminate a temporary variable. There are still a few hits grepping for "?.*xstrdup", but these are of slightly different forms and cannot be converted (e.g., "x ? xstrdup(x->foo) : NULL"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-01-13 02:59:09 +01:00			`gs->name = xstrdup_or_null(name);`
			`gs->path = xstrdup_or_null(path);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`gs->buf = NULL;`
			`gs->size = 0;`
grep: cache userdiff_driver in grep_source Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:20:43 +01:00			`gs->driver = NULL;`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00
			`switch (type) {`
			`case GREP_SOURCE_FILE:`
			`gs->identifier = xstrdup(identifier);`
			`break;`
			`case GREP_SOURCE_SHA1:`
			`gs->identifier = xmalloc(20);`
Use hashcpy() when copying object names We invented hashcpy() to keep the abstraction of "object name" behind it. Use it instead of calling memcpy() with hard-coded 20-byte length when moving object names between pieces of memory. Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash algorithm whose output is and will always be 20 bytes. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-03 10:39:59 +01:00			`hashcpy(gs->identifier, identifier);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`break;`
			`case GREP_SOURCE_BUF:`
			`gs->identifier = NULL;`
			`}`
			`}`

			`void grep_source_clear(struct grep_source *gs)`
			`{`
			`free(gs->name);`
			`gs->name = NULL;`
grep: stop looking at random places for .gitattributes grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-12 12:49:38 +02:00			`free(gs->path);`
			`gs->path = NULL;`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`free(gs->identifier);`
			`gs->identifier = NULL;`
			`grep_source_clear_data(gs);`
			`}`

			`void grep_source_clear_data(struct grep_source *gs)`
			`{`
			`switch (gs->type) {`
			`case GREP_SOURCE_FILE:`
			`case GREP_SOURCE_SHA1:`
			`free(gs->buf);`
			`gs->buf = NULL;`
			`gs->size = 0;`
			`break;`
			`case GREP_SOURCE_BUF:`
			`/* leave user-provided buf intact */`
			`break;`
			`}`
			`}`

			`static int grep_source_load_sha1(struct grep_source *gs)`
			`{`
			`enum object_type type;`

			`grep_read_lock();`
			`gs->buf = read_sha1_file(gs->identifier, &type, &gs->size);`
			`grep_read_unlock();`

			`if (!gs->buf)`
			`return error(_("'%s': unable to read %s"),`
			`gs->name,`
			`sha1_to_hex(gs->identifier));`
			`return 0;`
			`}`

			`static int grep_source_load_file(struct grep_source *gs)`
			`{`
			`const char *filename = gs->identifier;`
			`struct stat st;`
			`char *data;`
			`size_t size;`
			`int i;`

			`if (lstat(filename, &st) < 0) {`
			`err_ret:`
			`if (errno != ENOENT)`
			`error(_("'%s': %s"), filename, strerror(errno));`
			`return -1;`
			`}`
			`if (!S_ISREG(st.st_mode))`
			`return -1;`
			`size = xsize_t(st.st_size);`
			`i = open(filename, O_RDONLY);`
			`if (i < 0)`
			`goto err_ret;`
use xmallocz to avoid size arithmetic We frequently allocate strings as xmalloc(len + 1), where the extra 1 is for the NUL terminator. This can be done more simply with xmallocz, which also checks for integer overflow. There's no case where switching xmalloc(n+1) to xmallocz(n) is wrong; the result is the same length, and malloc made no guarantees about what was in the buffer anyway. But in some cases, we can stop manually placing NUL at the end of the allocated buffer. But that's only safe if it's clear that the contents will always fill the buffer. In each case where this patch does so, I manually examined the control flow, and I tried to err on the side of caution. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-22 23:44:28 +01:00			`data = xmallocz(size);`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`if (st.st_size != read_in_full(i, data, size)) {`
			`error(_("'%s': short read %s"), filename, strerror(errno));`
			`close(i);`
			`free(data);`
			`return -1;`
			`}`
			`close(i);`

			`gs->buf = data;`
			`gs->size = size;`
			`return 0;`
			`}`

grep.c: make two symbols really file-scope static this time Adding a declaration at the beginning is not sufficient for obvious reasons. The definition has to be made static. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-20 23:20:09 +02:00			`static int grep_source_load(struct grep_source *gs)`
grep: refactor the concept of "grep source" into an object The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:19:28 +01:00			`{`
			`if (gs->buf)`
			`return 0;`

			`switch (gs->type) {`
			`case GREP_SOURCE_FILE:`
			`return grep_source_load_file(gs);`
			`case GREP_SOURCE_SHA1:`
			`return grep_source_load_sha1(gs);`
			`case GREP_SOURCE_BUF:`
			`return gs->buf ? 0 : -1;`
			`}`
			`die("BUG: invalid grep_source type");`
grep --all-match This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-28 02:50:52 +02:00			`}`
grep: cache userdiff_driver in grep_source Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:20:43 +01:00
			`void grep_source_load_driver(struct grep_source *gs)`
			`{`
			`if (gs->driver)`
			`return;`

			`grep_attr_lock();`
grep: stop looking at random places for .gitattributes grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-10-12 12:49:38 +02:00			`if (gs->path)`
			`gs->driver = userdiff_find_by_path(gs->path);`
grep: cache userdiff_driver in grep_source Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:20:43 +01:00			`if (!gs->driver)`
			`gs->driver = userdiff_find_by_name("default");`
			`grep_attr_unlock();`
			`}`
grep: respect diff attributes for binary-ness There is currently no way for users to tell git-grep that a particular path is or is not a binary file; instead, grep always relies on its auto-detection (or the user specifying "-a" to treat all binary-looking files like text). This patch teaches git-grep to use the same attribute lookup that is used by git-diff. We could add a new "grep" flag, but that is unnecessarily complex and unlikely to be useful. Despite the name, the "-diff" attribute (or "diff=foo" and the associated diff.foo.binary config option) are really about describing the contents of the path. It's simply historical that diff was the only thing that cared about these attributes in the past. And if this simple approach turns out to be insufficient, we still have a backwards-compatible path forward: we can add a separate "grep" attribute, and fall back to respecting "diff" if it is unset. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:21:02 +01:00
grep.c: make two symbols really file-scope static this time Adding a declaration at the beginning is not sufficient for obvious reasons. The definition has to be made static. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-20 23:20:09 +02:00			`static int grep_source_is_binary(struct grep_source *gs)`
grep: respect diff attributes for binary-ness There is currently no way for users to tell git-grep that a particular path is or is not a binary file; instead, grep always relies on its auto-detection (or the user specifying "-a" to treat all binary-looking files like text). This patch teaches git-grep to use the same attribute lookup that is used by git-diff. We could add a new "grep" flag, but that is unnecessarily complex and unlikely to be useful. Despite the name, the "-diff" attribute (or "diff=foo" and the associated diff.foo.binary config option) are really about describing the contents of the path. It's simply historical that diff was the only thing that cared about these attributes in the past. And if this simple approach turns out to be insufficient, we still have a backwards-compatible path forward: we can add a separate "grep" attribute, and fall back to respecting "diff" if it is unset. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-02 09:21:02 +01:00			`{`
			`grep_source_load_driver(gs);`
			`if (gs->driver->binary != -1)`
			`return gs->driver->binary;`

			`if (!grep_source_load(gs))`
			`return buffer_is_binary(gs->buf, gs->size);`

			`return 0;`
			`}`