mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-05 08:47:56 +01:00

263 lines

6.3 KiB

C

Raw Normal View History

[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`/*`
			`* Copyright (C) 2005 Junio C Hamano`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`* Copyright (C) 2010 Google Inc.`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`*/`
			`#include "cache.h"`
			`#include "diff.h"`
			`#include "diffcore.h"`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`#include "xdiff-interface.h"`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`#include "kwset.h"`
diffcore-pickaxe: support case insensitive match on non-ascii Similar to the "grep -F -i" case, we can't use kws on icase search outside ascii range, so we quote the string and pass it to regcomp as a basic regexp and let regex engine deal with case sensitivity. The new test is put in t7812 instead of t4209-log-pickaxe because lib-gettext.sh might cause problems elsewhere, probably. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-25 07:22:37 +02:00			`#include "commit.h"`
			`#include "quote.h"`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`typedef int (pickaxe_fn)(mmfile_t one, mmfile_t *two,`
			`struct diff_options *o,`
			`regex_t *regexp, kwset_t kws);`

git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`struct diffgrep_cb {`
			`regex_t *regexp;`
			`int hit;`
			`};`

			`static void diffgrep_consume(void priv, char line, unsigned long len)`
			`{`
			`struct diffgrep_cb *data = priv;`
			`regmatch_t regmatch;`

			`if (line[0] != '+' && line[0] != '-')`
			`return;`
			`if (data->hit)`
			`/*`
			`* NEEDSWORK: we should have a way to terminate the`
			`* caller early.`
			`*/`
			`return;`
regex: use regexec_buf() The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-09-21 20:24:14 +02:00			`data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,`
			`&regmatch, 0);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`}`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`static int diff_grep(mmfile_t one, mmfile_t two,`
			`struct diff_options *o,`
pickaxe: give diff_grep the same signature as has_changes Change diff_grep() to match the signature of has_changes() as a preparation for the next patch that will use function pointers to the two. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:18 +02:00			`regex_t *regexp, kwset_t kws)`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`{`
			`regmatch_t regmatch;`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`struct diffgrep_cb ecbdata;`
			`xpparam_t xpp;`
			`xdemitconf_t xecfg;`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`if (!one)`
regex: use regexec_buf() The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-09-21 20:24:14 +02:00			`return !regexec_buf(regexp, two->ptr, two->size,`
			`1, &regmatch, 0);`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`if (!two)`
regex: use regexec_buf() The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-09-21 20:24:14 +02:00			`return !regexec_buf(regexp, one->ptr, one->size,`
			`1, &regmatch, 0);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`/*`
			`* We have both sides; need to run textual diff and see if`
			`* the pattern appears on added/deleted lines.`
			`*/`
			`memset(&xpp, 0, sizeof(xpp));`
			`memset(&xecfg, 0, sizeof(xecfg));`
			`ecbdata.regexp = regexp;`
			`ecbdata.hit = 0;`
			`xecfg.ctxlen = o->context;`
			`xecfg.interhunkctxlen = o->interhunkcontext;`
react to errors in xdi_diff When we call into xdiff to perform a diff, we generally lose the return code completely. Typically by ignoring the return of our xdi_diff wrapper, but sometimes we even propagate that return value up and then ignore it later. This can lead to us silently producing incorrect diffs (e.g., "git log" might produce no output at all, not even a diff header, for a content-level diff). In practice this does not happen very often, because the typical reason for xdiff to report failure is that it malloc() failed (it uses straight malloc, and not our xmalloc wrapper). But it could also happen when xdiff triggers one our callbacks, which returns an error (e.g., outf() in builtin/rerere.c tries to report a write failure in this way). And the next patch also plans to add more failure modes. Let's notice an error return from xdiff and react appropriately. In most of the diff.c code, we can simply die(), which matches the surrounding code (e.g., that is what we do if we fail to load a file for diffing in the first place). This is not that elegant, but we are probably better off dying to let the user know there was a problem, rather than simply generating bogus output. We could also just die() directly in xdi_diff, but the callers typically have a bit more context, and can provide a better message (and if we do later decide to pass errors up, we're one step closer to doing so). There is one interesting case, which is in diff_grep(). Here if we cannot generate the diff, there is nothing to match, and we silently return "no hits". This is actually what the existing code does already, but we make it a little more explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-25 01:12:23 +02:00			`if (xdi_diff_outf(one, two, diffgrep_consume, &ecbdata, &xpp, &xecfg))`
			`return 0;`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`return ecbdata.hit;`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`}`

diffcore-pickaxe: simplify has_changes and contains Halve the number of callsites of contains() to two using temporary variables, simplifying the code. While at it, get rid of the diff_options parameter, which became unused with 8fa4b09f. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-06 15:53:27 +02:00			`static unsigned int contains(mmfile_t mf, regex_t regexp, kwset_t kws)`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`{`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`unsigned int cnt;`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`unsigned long sz;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`const char *data;`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`sz = mf->size;`
			`data = mf->ptr;`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`cnt = 0;`

Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`if (regexp) {`
			`regmatch_t regmatch;`
			`int flags = 0;`

pickaxe: fix segfault with '-S<...> --pickaxe-regex' 'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result of out-of-bounds memory reads. diffcore-pickaxe.c:contains() looks for all matches of the given regex in a buffer in a loop, advancing the buffer pointer to the end of the last match in each iteration. When we switched to REG_STARTEND in b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing the size of that buffer to the regexp engine, too. Unfortunately, this buffer size is never updated on subsequent iterations, and as the buffer pointer advances on each iteration, this "bufptr+bufsize" points past the end of the buffer. This results in segmentation fault, if that memory can't be accessed. In case of 'git log' it can also result in erroneously listed commits, if the memory past the end of buffer is accessible and happens to contain data matching the regex. Reduce the buffer size on each iteration as the buffer pointer is advanced, thus maintaining the correct end of buffer location. Furthermore, make sure that the buffer pointer is not dereferenced in the control flow statements when we already reached the end of the buffer. The new test is flaky, I've never seen it fail on my Linux box even without the fix, but this is expected according to db5dfa3 (regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails, 2016-09-21). However, it did fail on Travis CI with the first (and incomplete) version of the fix, and based on that commit message I would expect the new test without the fix to fail most of the time on Windows. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-18 19:24:08 +01:00			`while (sz && *data &&`
regex: use regexec_buf() The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-09-21 20:24:14 +02:00			`!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`flags \|= REG_NOTBOL;`
pickaxe: count regex matches only once When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-16 19:38:42 +01:00			`data += regmatch.rm_eo;`
pickaxe: fix segfault with '-S<...> --pickaxe-regex' 'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result of out-of-bounds memory reads. diffcore-pickaxe.c:contains() looks for all matches of the given regex in a buffer in a loop, advancing the buffer pointer to the end of the last match in each iteration. When we switched to REG_STARTEND in b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing the size of that buffer to the regexp engine, too. Unfortunately, this buffer size is never updated on subsequent iterations, and as the buffer pointer advances on each iteration, this "bufptr+bufsize" points past the end of the buffer. This results in segmentation fault, if that memory can't be accessed. In case of 'git log' it can also result in erroneously listed commits, if the memory past the end of buffer is accessible and happens to contain data matching the regex. Reduce the buffer size on each iteration as the buffer pointer is advanced, thus maintaining the correct end of buffer location. Furthermore, make sure that the buffer pointer is not dereferenced in the control flow statements when we already reached the end of the buffer. The new test is flaky, I've never seen it fail on my Linux box even without the fix, but this is expected according to db5dfa3 (regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails, 2016-09-21). However, it did fail on Travis CI with the first (and incomplete) version of the fix, and based on that commit message I would expect the new test without the fix to fail most of the time on Windows. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-18 19:24:08 +01:00			`sz -= regmatch.rm_eo;`
			`if (sz && *data && regmatch.rm_so == regmatch.rm_eo) {`
pickaxe: count regex matches only once When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-16 19:38:42 +01:00			`data++;`
pickaxe: fix segfault with '-S<...> --pickaxe-regex' 'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result of out-of-bounds memory reads. diffcore-pickaxe.c:contains() looks for all matches of the given regex in a buffer in a loop, advancing the buffer pointer to the end of the last match in each iteration. When we switched to REG_STARTEND in b7d36ffca (regex: use regexec_buf(), 2016-09-21), we started passing the size of that buffer to the regexp engine, too. Unfortunately, this buffer size is never updated on subsequent iterations, and as the buffer pointer advances on each iteration, this "bufptr+bufsize" points past the end of the buffer. This results in segmentation fault, if that memory can't be accessed. In case of 'git log' it can also result in erroneously listed commits, if the memory past the end of buffer is accessible and happens to contain data matching the regex. Reduce the buffer size on each iteration as the buffer pointer is advanced, thus maintaining the correct end of buffer location. Furthermore, make sure that the buffer pointer is not dereferenced in the control flow statements when we already reached the end of the buffer. The new test is flaky, I've never seen it fail on my Linux box even without the fix, but this is expected according to db5dfa3 (regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails, 2016-09-21). However, it did fail on Travis CI with the first (and incomplete) version of the fix, and based on that commit message I would expect the new test without the fix to fail most of the time on Windows. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-18 19:24:08 +01:00			`sz--;`
			`}`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`cnt++;`
			`}`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00
			`} else { /* Classic exact string match */`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`while (sz) {`
pickaxe: pass diff_options to contains and has_changes Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:06 +02:00			`struct kwsmatch kwsm;`
			`size_t offset = kwsexec(kws, data, sz, &kwsm);`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`if (offset == -1)`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`break;`
pickaxe: simplify kwset loop in contains() Inlining the variable "found" actually makes the code shorter and easier to read. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:16:00 +01:00			`sz -= offset + kwsm.size[0];`
			`data += offset + kwsm.size[0];`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`cnt++;`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`}`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`}`
			`return cnt;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`}`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`static int has_changes(mmfile_t one, mmfile_t two,`
			`struct diff_options *o,`
pickaxe: pass diff_options to contains and has_changes Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:06 +02:00			`regex_t *regexp, kwset_t kws)`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`{`
diffcore-pickaxe: simplify has_changes and contains Halve the number of callsites of contains() to two using temporary variables, simplifying the code. While at it, get rid of the diff_options parameter, which became unused with 8fa4b09f. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-06 15:53:27 +02:00			`unsigned int one_contains = one ? contains(one, regexp, kws) : 0;`
			`unsigned int two_contains = two ? contains(two, regexp, kws) : 0;`
			`return one_contains != two_contains;`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`}`

			`static int pickaxe_match(struct diff_filepair p, struct diff_options o,`
			`regex_t *regexp, kwset_t kws, pickaxe_fn fn)`
pickaxe: factor out has_changes Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:26:24 +02:00			`{`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00			`struct userdiff_driver *textconv_one = NULL;`
			`struct userdiff_driver *textconv_two = NULL;`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`mmfile_t mf1, mf2;`
			`int ret;`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`/* ignore unmerged */`
			`if (!DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two))`
			`return 0;`

diffcore: add a pickaxe option to find a specific blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) One might be tempted to extend git-describe to also work with blobs, such that `git describe <blob-id>` gives a description as '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer number of responses (>110), it turns out this is tricky to get right. The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches. Junio hinted at a different approach of solving this problem, which this patch implements. Teach the diff machinery another flag for restricting the information to what is shown. For example: $ ./git log --oneline --find-object=v2.0.0:Makefile b2feb64309 Revert the whole "ask curl-config" topic for now 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b. The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-01-04 23:50:42 +01:00			`if (o->objfind) {`
			`return (DIFF_FILE_VALID(p->one) &&`
			`oidset_contains(o->objfind, &p->one->oid)) \|\|`
			`(DIFF_FILE_VALID(p->two) &&`
			`oidset_contains(o->objfind, &p->two->oid));`
			`}`

			`if (!o->pickaxe[0])`
			`return 0;`

diff: make struct diff_flags members lowercase Now that the flags stored in struct diff_flags are being accessed directly and not through macros, change all struct members from being uppercase to lowercase. This conversion is done using the following semantic patch: @@ expression E; @@ - E.RECURSIVE + E.recursive @@ expression E; @@ - E.TREE_IN_RECURSIVE + E.tree_in_recursive @@ expression E; @@ - E.BINARY + E.binary @@ expression E; @@ - E.TEXT + E.text @@ expression E; @@ - E.FULL_INDEX + E.full_index @@ expression E; @@ - E.SILENT_ON_REMOVE + E.silent_on_remove @@ expression E; @@ - E.FIND_COPIES_HARDER + E.find_copies_harder @@ expression E; @@ - E.FOLLOW_RENAMES + E.follow_renames @@ expression E; @@ - E.RENAME_EMPTY + E.rename_empty @@ expression E; @@ - E.HAS_CHANGES + E.has_changes @@ expression E; @@ - E.QUICK + E.quick @@ expression E; @@ - E.NO_INDEX + E.no_index @@ expression E; @@ - E.ALLOW_EXTERNAL + E.allow_external @@ expression E; @@ - E.EXIT_WITH_STATUS + E.exit_with_status @@ expression E; @@ - E.REVERSE_DIFF + E.reverse_diff @@ expression E; @@ - E.CHECK_FAILED + E.check_failed @@ expression E; @@ - E.RELATIVE_NAME + E.relative_name @@ expression E; @@ - E.IGNORE_SUBMODULES + E.ignore_submodules @@ expression E; @@ - E.DIRSTAT_CUMULATIVE + E.dirstat_cumulative @@ expression E; @@ - E.DIRSTAT_BY_FILE + E.dirstat_by_file @@ expression E; @@ - E.ALLOW_TEXTCONV + E.allow_textconv @@ expression E; @@ - E.TEXTCONV_SET_VIA_CMDLINE + E.textconv_set_via_cmdline @@ expression E; @@ - E.DIFF_FROM_CONTENTS + E.diff_from_contents @@ expression E; @@ - E.DIRTY_SUBMODULES + E.dirty_submodules @@ expression E; @@ - E.IGNORE_UNTRACKED_IN_SUBMODULES + E.ignore_untracked_in_submodules @@ expression E; @@ - E.IGNORE_DIRTY_SUBMODULES + E.ignore_dirty_submodules @@ expression E; @@ - E.OVERRIDE_SUBMODULE_CONFIG + E.override_submodule_config @@ expression E; @@ - E.DIRSTAT_BY_LINE + E.dirstat_by_line @@ expression E; @@ - E.FUNCCONTEXT + E.funccontext @@ expression E; @@ - E.PICKAXE_IGNORE_CASE + E.pickaxe_ignore_case @@ expression E; @@ - E.DEFAULT_FOLLOW_RENAMES + E.default_follow_renames Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-10-31 19:19:11 +01:00			`if (o->flags.allow_textconv) {`
diffcore-pickaxe: respect --no-textconv git log -S doesn't respect --no-textconv: $ echo '*.txt diff=wrong' > .gitattributes $ git -c diff.wrong.textconv='xxx' log --no-textconv -Sfoo error: cannot run xxx: No such file or directory fatal: unable to read files to diff Reported-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 15:16:30 +02:00			`textconv_one = get_textconv(p->one);`
			`textconv_two = get_textconv(p->two);`
			`}`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`/*`
			`* If we have an unmodified pair, we know that the count will be the`
			`* same and don't even have to load the blobs. Unless textconv is in`
			`* play, _and_ we are using two different textconv filters (e.g.,`
			`* because a pair is an exact rename with different textconv attributes`
			`* for each side, which might generate different content).`
			`*/`
			`if (textconv_one == textconv_two && diff_unmodified_pair(p))`
			`return 0;`

diffcore-pickaxe: remove fill_one() fill_one is _almost_ identical to just calling fill_textconv; the exception is that for the !DIFF_FILE_VALID case, fill_textconv gives us an empty buffer rather than a NULL one. Since we currently use the NULL pointer as a signal that the file is not present on one side of the diff, we must now switch to using DIFF_FILE_VALID to make the same check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 02:08:47 +02:00			`mf1.size = fill_textconv(textconv_one, p->one, &mf1.ptr);`
			`mf2.size = fill_textconv(textconv_two, p->two, &mf2.ptr);`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`ret = fn(DIFF_FILE_VALID(p->one) ? &mf1 : NULL,`
			`DIFF_FILE_VALID(p->two) ? &mf2 : NULL,`
			`o, regexp, kws);`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00
			`if (textconv_one)`
			`free(mf1.ptr);`
			`if (textconv_two)`
			`free(mf2.ptr);`
			`diff_free_filespec_data(p->one);`
			`diff_free_filespec_data(p->two);`

			`return ret;`
pickaxe: factor out has_changes Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:26:24 +02:00			`}`

pickaxe: move pickaxe() after pickaxe_match() pickaxe() calls pickaxe_match(); moving the definition of the former after the latter allows us to do without an explicit function declaration. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:58 +01:00			`static void pickaxe(struct diff_queue_struct q, struct diff_options o,`
			`regex_t *regexp, kwset_t kws, pickaxe_fn fn)`
			`{`
			`int i;`
			`struct diff_queue_struct outq;`

			`DIFF_QUEUE_CLEAR(&outq);`

			`if (o->pickaxe_opts & DIFF_PICKAXE_ALL) {`
			`/* Showing the whole changeset if needle exists */`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
			`if (pickaxe_match(p, o, regexp, kws, fn))`
			`return; /* do not munge the queue */`
			`}`

			`/*`
			`* Otherwise we will clear the whole queue by copying`
			`* the empty outq at the end of this function, but`
			`* first clear the current entries in the queue.`
			`*/`
			`for (i = 0; i < q->nr; i++)`
			`diff_free_filepair(q->queue[i]);`
			`} else {`
			`/* Showing only the filepairs that has the needle */`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
			`if (pickaxe_match(p, o, regexp, kws, fn))`
			`diff_q(&outq, p);`
			`else`
			`diff_free_filepair(p);`
			`}`
			`}`

			`free(q->queue);`
			`*q = outq;`
			`}`

diffcore-pickaxe: Add regcomp_or_die() There's another regcomp code block coming in this function that needs the same error handling. This function can help avoid duplicating error handling code. Helped-by: Jeff King <peff@peff.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-25 07:22:36 +02:00			`static void regcomp_or_die(regex_t regex, const char needle, int cflags)`
			`{`
			`int err = regcomp(regex, needle, cflags);`
			`if (err) {`
			`/* The POSIX.2 people are surely sick */`
			`char errbuf[1024];`
			`regerror(err, regex, errbuf, 1024);`
			`regfree(regex);`
			`die("invalid regex: %s", errbuf);`
			`}`
			`}`

pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into diffcore_pickaxe() diffcore_pickaxe_count() initializes the regular expression or kwset for the search term, calls pickaxe() with the callback has_changes() and cleans up afterwards. diffcore_pickaxe_grep() does the same, only it doesn't support kwset and uses the callback diff_grep() instead. Merge the two functions to form the new diffcore_pickaxe() and thus get rid of the duplicate regex setup and cleanup code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:57 +01:00			`void diffcore_pickaxe(struct diff_options *o)`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`{`
diff: pass the entire diff-options to diffcore_pickaxe() That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-31 22:44:39 +02:00			`const char *needle = o->pickaxe;`
			`int opts = o->pickaxe_opts;`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`regex_t regex, *regexp = NULL;`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`kwset_t kws = NULL;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00
pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into diffcore_pickaxe() diffcore_pickaxe_count() initializes the regular expression or kwset for the search term, calls pickaxe() with the callback has_changes() and cleans up afterwards. diffcore_pickaxe_grep() does the same, only it doesn't support kwset and uses the callback diff_grep() instead. Merge the two functions to form the new diffcore_pickaxe() and thus get rid of the duplicate regex setup and cleanup code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:57 +01:00			`if (opts & (DIFF_PICKAXE_REGEX \| DIFF_PICKAXE_KIND_G)) {`
pickaxe: honor -i when used with -S and --pickaxe-regex accccde4 (pickaxe: allow -i to search in patch case-insensitively) allowed case-insenitive matching for -G and -S, but for the latter only if fixed string matching is used. Allow it for -S and regular expression matching as well to make the support complete. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:56 +01:00			`int cflags = REG_EXTENDED \| REG_NEWLINE;`
diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit Currently flags for pickaxing are found in different places. Unify the flags into the `pickaxe_opts` field, which will contain any pickaxe related flags. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-01-04 23:50:40 +01:00			`if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)`
pickaxe: honor -i when used with -S and --pickaxe-regex accccde4 (pickaxe: allow -i to search in patch case-insensitively) allowed case-insenitive matching for -G and -S, but for the latter only if fixed string matching is used. Allow it for -S and regular expression matching as well to make the support complete. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:56 +01:00			`cflags \|= REG_ICASE;`
diffcore-pickaxe: Add regcomp_or_die() There's another regcomp code block coming in this function that needs the same error handling. This function can help avoid duplicating error handling code. Helped-by: Jeff King <peff@peff.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-25 07:22:36 +02:00			`regcomp_or_die(&regex, needle, cflags);`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`regexp = &regex;`
diffcore: add a pickaxe option to find a specific blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) One might be tempted to extend git-describe to also work with blobs, such that `git describe <blob-id>` gives a description as '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer number of responses (>110), it turns out this is tricky to get right. The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches. Junio hinted at a different approach of solving this problem, which this patch implements. Teach the diff machinery another flag for restricting the information to what is shown. For example: $ ./git log --oneline --find-object=v2.0.0:Makefile b2feb64309 Revert the whole "ask curl-config" topic for now 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b. The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-01-04 23:50:42 +01:00			`} else if (opts & DIFF_PICKAXE_KIND_S) {`
			`if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&`
			`has_non_ascii(needle)) {`
			`struct strbuf sb = STRBUF_INIT;`
			`int cflags = REG_NEWLINE \| REG_ICASE;`

			`basic_regex_quote_buf(&sb, needle);`
			`regcomp_or_die(&regex, sb.buf, cflags);`
			`strbuf_release(&sb);`
			`regexp = &regex;`
			`} else {`
			`kws = kwsalloc(o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE`
			`? tolower_trans_tbl : NULL);`
			`kwsincr(kws, needle, strlen(needle));`
			`kwsprep(kws);`
			`}`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`}`

pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into diffcore_pickaxe() diffcore_pickaxe_count() initializes the regular expression or kwset for the search term, calls pickaxe() with the callback has_changes() and cleans up afterwards. diffcore_pickaxe_grep() does the same, only it doesn't support kwset and uses the callback diff_grep() instead. Merge the two functions to form the new diffcore_pickaxe() and thus get rid of the duplicate regex setup and cleanup code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:57 +01:00			`pickaxe(&diff_queued_diff, o, regexp, kws,`
			`(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);`
pickaxe: plug regex/kws leak With -S... --pickaxe-all, free the regex or the kws before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:23:11 +02:00
pickaxe: merge diffcore_pickaxe_grep() and diffcore_pickaxe_count() into diffcore_pickaxe() diffcore_pickaxe_count() initializes the regular expression or kwset for the search term, calls pickaxe() with the callback has_changes() and cleans up afterwards. diffcore_pickaxe_grep() does the same, only it doesn't support kwset and uses the callback diff_grep() instead. Merge the two functions to form the new diffcore_pickaxe() and thus get rid of the duplicate regex setup and cleanup code. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-22 18:15:57 +01:00			`if (regexp)`
			`regfree(regexp);`
diffcore: add a pickaxe option to find a specific blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) One might be tempted to extend git-describe to also work with blobs, such that `git describe <blob-id>` gives a description as '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer number of responses (>110), it turns out this is tricky to get right. The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches. Junio hinted at a different approach of solving this problem, which this patch implements. Teach the diff machinery another flag for restricting the information to what is shown. For example: $ ./git log --oneline --find-object=v2.0.0:Makefile b2feb64309 Revert the whole "ask curl-config" topic for now 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b. The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-01-04 23:50:42 +01:00			`if (kws)`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`kwsfree(kws);`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`return;`
			`}`