mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-09 02:33:11 +01:00

273 lines

6.4 KiB

C

Raw Normal View History

[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`/*`
			`* Copyright (C) 2005 Junio C Hamano`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`* Copyright (C) 2010 Google Inc.`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`*/`
			`#include "cache.h"`
			`#include "diff.h"`
			`#include "diffcore.h"`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`#include "xdiff-interface.h"`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`#include "kwset.h"`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`typedef int (pickaxe_fn)(mmfile_t one, mmfile_t *two,`
			`struct diff_options *o,`
			`regex_t *regexp, kwset_t kws);`

			`static int pickaxe_match(struct diff_filepair p, struct diff_options o,`
			`regex_t *regexp, kwset_t kws, pickaxe_fn fn);`
pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00
			`static void pickaxe(struct diff_queue_struct q, struct diff_options o,`
			`regex_t *regexp, kwset_t kws, pickaxe_fn fn)`
			`{`
			`int i;`
			`struct diff_queue_struct outq;`

			`DIFF_QUEUE_CLEAR(&outq);`

			`if (o->pickaxe_opts & DIFF_PICKAXE_ALL) {`
			`/* Showing the whole changeset if needle exists */`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`if (pickaxe_match(p, o, regexp, kws, fn))`
pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00			`return; /* do not munge the queue */`
			`}`

			`/*`
			`* Otherwise we will clear the whole queue by copying`
			`* the empty outq at the end of this function, but`
			`* first clear the current entries in the queue.`
			`*/`
			`for (i = 0; i < q->nr; i++)`
			`diff_free_filepair(q->queue[i]);`
			`} else {`
			`/* Showing only the filepairs that has the needle */`
			`for (i = 0; i < q->nr; i++) {`
			`struct diff_filepair *p = q->queue[i];`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`if (pickaxe_match(p, o, regexp, kws, fn))`
pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00			`diff_q(&outq, p);`
			`else`
			`diff_free_filepair(p);`
			`}`
			`}`

			`free(q->queue);`
			`*q = outq;`
			`}`

git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`struct diffgrep_cb {`
			`regex_t *regexp;`
			`int hit;`
			`};`

			`static void diffgrep_consume(void priv, char line, unsigned long len)`
			`{`
			`struct diffgrep_cb *data = priv;`
			`regmatch_t regmatch;`
			`int hold;`

			`if (line[0] != '+' && line[0] != '-')`
			`return;`
			`if (data->hit)`
			`/*`
			`* NEEDSWORK: we should have a way to terminate the`
			`* caller early.`
			`*/`
			`return;`
			`/* Yuck -- line ought to be "const char "! /`
			`hold = line[len];`
			`line[len] = '\0';`
			`data->hit = !regexec(data->regexp, line + 1, 1, &regmatch, 0);`
			`line[len] = hold;`
			`}`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`static int diff_grep(mmfile_t one, mmfile_t two,`
			`struct diff_options *o,`
pickaxe: give diff_grep the same signature as has_changes Change diff_grep() to match the signature of has_changes() as a preparation for the next patch that will use function pointers to the two. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:18 +02:00			`regex_t *regexp, kwset_t kws)`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`{`
			`regmatch_t regmatch;`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`struct diffgrep_cb ecbdata;`
			`xpparam_t xpp;`
			`xdemitconf_t xecfg;`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`if (!one)`
			`return !regexec(regexp, two->ptr, 1, &regmatch, 0);`
			`if (!two)`
			`return !regexec(regexp, one->ptr, 1, &regmatch, 0);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`/*`
			`* We have both sides; need to run textual diff and see if`
			`* the pattern appears on added/deleted lines.`
			`*/`
			`memset(&xpp, 0, sizeof(xpp));`
			`memset(&xecfg, 0, sizeof(xecfg));`
			`ecbdata.regexp = regexp;`
			`ecbdata.hit = 0;`
			`xecfg.ctxlen = o->context;`
			`xecfg.interhunkctxlen = o->interhunkcontext;`
			`xdi_diff_outf(one, two, diffgrep_consume, &ecbdata,`
			`&xpp, &xecfg);`
			`return ecbdata.hit;`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`}`

			`static void diffcore_pickaxe_grep(struct diff_options *o)`
			`{`
pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00			`int err;`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`regex_t regex;`
pickaxe: allow -i to search in patch case-insensitively "git log -S<string>" is a useful way to find the last commit in the codebase that touched the <string>. As it was designed to be used by a porcelain script to dig the history starting from a block of text that appear in the starting commit, it never had to look for anything but an exact match. When used by an end user who wants to look for the last commit that removed a string (e.g. name of a variable) that he vaguely remembers, however, it is useful to support case insensitive match. When given the "--regexp-ignore-case" (or "-i") option, which originally was designed to affect case sensitivity of the search done in the commit log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is done case insensitively now. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-21 10:02:46 +01:00			`int cflags = REG_EXTENDED \| REG_NEWLINE;`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
pickaxe: allow -i to search in patch case-insensitively "git log -S<string>" is a useful way to find the last commit in the codebase that touched the <string>. As it was designed to be used by a porcelain script to dig the history starting from a block of text that appear in the starting commit, it never had to look for anything but an exact match. When used by an end user who wants to look for the last commit that removed a string (e.g. name of a variable) that he vaguely remembers, however, it is useful to support case insensitive match. When given the "--regexp-ignore-case" (or "-i") option, which originally was designed to affect case sensitivity of the search done in the commit log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is done case insensitively now. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-21 10:02:46 +01:00			`if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))`
			`cflags \|= REG_ICASE;`

			`err = regcomp(&regex, o->pickaxe, cflags);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`if (err) {`
			`char errbuf[1024];`
			`regerror(err, &regex, errbuf, 1024);`
			`regfree(&regex);`
diffcore-pickaxe: make error messages more consistent Currently, diffcore-pickaxe reports two distinct errors for the same user error: $ git log --pickaxe-regex -S'\1' fatal: invalid pickaxe regex: Invalid back reference $ git log -G'\1' fatal: invalid log-grep regex: Invalid back reference This "log-grep" was only an internal name for the -G feature during development, and invite confusion with "git log --grep=<pattern>". Change the error messages to say "invalid regex". Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-31 14:12:14 +02:00			`die("invalid regex: %s", errbuf);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`}`

pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00			`pickaxe(&diff_queued_diff, o, &regex, NULL, diff_grep);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
pickaxe: plug regex leak With -G... --pickaxe-all, free the regex before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:14:55 +02:00			`regfree(&regex);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`return;`
			`}`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00
diffcore-pickaxe: simplify has_changes and contains Halve the number of callsites of contains() to two using temporary variables, simplifying the code. While at it, get rid of the diff_options parameter, which became unused with 8fa4b09f. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-06 15:53:27 +02:00			`static unsigned int contains(mmfile_t mf, regex_t regexp, kwset_t kws)`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`{`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`unsigned int cnt;`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`unsigned long sz;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`const char *data;`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`sz = mf->size;`
			`data = mf->ptr;`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`cnt = 0;`

Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`if (regexp) {`
			`regmatch_t regmatch;`
			`int flags = 0;`

pickaxe: count regex matches only once When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-16 19:38:42 +01:00			`assert(data[sz] == '\0');`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`while (*data && !regexec(regexp, data, 1, &regmatch, flags)) {`
			`flags \|= REG_NOTBOL;`
pickaxe: count regex matches only once When --pickaxe-regex is used, forward past the end of matches instead of advancing to the byte after their start. This way matches count only once, even if the regular expression matches their tail -- like in the fixed-string fork of the code. E.g.: /.*/ used to count the number of bytes instead of the number of lines. /aa/ resulted in a count of two in "aaa" instead of one. Also document the fact that regexec() needs a NUL-terminated string as its second argument by adding an assert(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-16 19:38:42 +01:00			`data += regmatch.rm_eo;`
			`if (*data && regmatch.rm_so == regmatch.rm_eo)`
			`data++;`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`cnt++;`
			`}`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00
			`} else { /* Classic exact string match */`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`while (sz) {`
pickaxe: pass diff_options to contains and has_changes Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:06 +02:00			`struct kwsmatch kwsm;`
			`size_t offset = kwsexec(kws, data, sz, &kwsm);`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`const char *found;`
			`if (offset == -1)`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`break;`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`else`
			`found = data + offset;`
pickaxe: pass diff_options to contains and has_changes Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:06 +02:00			`sz -= found - data + kwsm.size[0];`
			`data = found + kwsm.size[0];`
diffcore-pickaxe: use memmem() Use memmem() instead of open-coding it. The system libraries usually have a much faster version than the memcmp()-loop here. Even our own fall-back in compat/, which is used on Windows, is slightly faster. The following commands were run in a Linux kernel repository and timed, the best of five results is shown: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null On Ubuntu 8.10 x64, before (v1.6.2-rc2): 8.09user 0.04system 0:08.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30952minor)pagefaults 0swaps And with the patch: 1.50user 0.04system 0:01.54elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+30645minor)pagefaults 0swaps On Fedora 10 x64, before: 8.34user 0.05system 0:08.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+29268minor)pagefaults 0swaps And with the patch: 1.15user 0.05system 0:01.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32253minor)pagefaults 0swaps On Windows Vista x64, before: real 0m9.204s user 0m0.000s sys 0m0.000s And with the patch: real 0m8.470s user 0m0.000s sys 0m0.000s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-03 00:00:55 +01:00			`cnt++;`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`}`
[PATCH] diffcore-pickaxe: switch to "counting" behaviour. Instead of finding old/new pair that one side has and the other side does not have the specified string, find old/new pair that contains the specified string as a substring different number of times. This would still not catch a case where you introduce two static variable declarations and remove two static function definitions from a file with -S"static", but would make it behave a bit more intuitively. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-07-24 01:35:25 +02:00			`}`
			`return cnt;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`}`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`static int has_changes(mmfile_t one, mmfile_t two,`
			`struct diff_options *o,`
pickaxe: pass diff_options to contains and has_changes Remove the unused parameter needle from contains() and has_changes(). Also replace the parameter len with a pointer to the diff_options. We can use its member pickaxe to check if the needle is an empty string and use the kwsmatch structure to find out the length of the match instead. This change is done as a preparation to unify the signatures of has_changes() and diff_grep(), which will be used in the patch after the next one to factor out common code. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:06 +02:00			`regex_t *regexp, kwset_t kws)`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`{`
diffcore-pickaxe: simplify has_changes and contains Halve the number of callsites of contains() to two using temporary variables, simplifying the code. While at it, get rid of the diff_options parameter, which became unused with 8fa4b09f. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-06 15:53:27 +02:00			`unsigned int one_contains = one ? contains(one, regexp, kws) : 0;`
			`unsigned int two_contains = two ? contains(two, regexp, kws) : 0;`
			`return one_contains != two_contains;`
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`}`

			`static int pickaxe_match(struct diff_filepair p, struct diff_options o,`
			`regex_t *regexp, kwset_t kws, pickaxe_fn fn)`
pickaxe: factor out has_changes Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:26:24 +02:00			`{`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00			`struct userdiff_driver *textconv_one = NULL;`
			`struct userdiff_driver *textconv_two = NULL;`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`mmfile_t mf1, mf2;`
			`int ret;`

pickaxe: hoist empty needle check If we are given an empty pickaxe needle like "git log -S ''", it is impossible for us to find anything (because no matter what the content, the count will always be 0). We currently check this at the lowest level of contains(). Let's hoist the logic much earlier to has_changes(), so that it is simpler to return our answer before loading any blob data. Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:34:06 +01:00			`if (!o->pickaxe[0])`
			`return 0;`

diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`/* ignore unmerged */`
			`if (!DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two))`
			`return 0;`

diffcore-pickaxe: respect --no-textconv git log -S doesn't respect --no-textconv: $ echo '*.txt diff=wrong' > .gitattributes $ git -c diff.wrong.textconv='xxx' log --no-textconv -Sfoo error: cannot run xxx: No such file or directory fatal: unable to read files to diff Reported-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 15:16:30 +02:00			`if (DIFF_OPT_TST(o, ALLOW_TEXTCONV)) {`
			`textconv_one = get_textconv(p->one);`
			`textconv_two = get_textconv(p->two);`
			`}`
diffcore-pickaxe: remove unnecessary call to get_textconv() The fill_one() function is responsible for finding and filling the textconv filter as necessary, and is called by diff_grep() function that implements "git log -G<pattern>". The has_changes() function that implements "git log -S<block>" calls get_textconv() for two sides being compared, before it checks to see if it was asked to perform the pickaxe limiting. Move the code around to avoid this wastage. After has_changes() calls get_textconv() to obtain textconv for both sides, fill_one() is called to use them. By adding get_textconv() to diff_grep() and relieving fill_one() of responsibility to find the textconv filter, we can avoid calling get_textconv() twice in has_changes(). With this change it's also no longer necessary for fill_one() to modify the textconv argument, therefore pass a pointer instead of a pointer to a pointer. Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-04 22:20:29 +02:00
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00			`/*`
			`* If we have an unmodified pair, we know that the count will be the`
			`* same and don't even have to load the blobs. Unless textconv is in`
			`* play, _and_ we are using two different textconv filters (e.g.,`
			`* because a pair is an exact rename with different textconv attributes`
			`* for each side, which might generate different content).`
			`*/`
			`if (textconv_one == textconv_two && diff_unmodified_pair(p))`
			`return 0;`

diffcore-pickaxe: remove fill_one() fill_one is _almost_ identical to just calling fill_textconv; the exception is that for the !DIFF_FILE_VALID case, fill_textconv gives us an empty buffer rather than a NULL one. Since we currently use the NULL pointer as a signal that the file is not present on one side of the diff, we must now switch to using DIFF_FILE_VALID to make the same check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 02:08:47 +02:00			`mf1.size = fill_textconv(textconv_one, p->one, &mf1.ptr);`
			`mf2.size = fill_textconv(textconv_two, p->two, &mf2.ptr);`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00
diffcore-pickaxe: unify code for log -S/-G The logic flow of has_changes() used for "log -S" and diff_grep() used for "log -G" are essentially the same. See if we have both sides that could be different in any interesting way, slurp the contents in core, possibly after applying textconv, inspect the contents, clean-up and report the result. The only difference between the two is how "inspect" step works. Unify this codeflow in a helper, pickaxe_match(), which takes a callback function that implements the specific "inspect" step. After removing the common scaffolding code from the existing has_changes() and diff_grep(), they each becomes such a callback function suitable for passing to pickaxe_match(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 07:28:10 +02:00			`ret = fn(DIFF_FILE_VALID(p->one) ? &mf1 : NULL,`
			`DIFF_FILE_VALID(p->two) ? &mf2 : NULL,`
			`o, regexp, kws);`
pickaxe: use textconv for -S counting We currently just look at raw blob data when using "-S" to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches "-S" to use textconv, just as we already do for "-G". Signed-off-by: Jeff King <peff@peff.net> 2012-10-28 13:27:12 +01:00
			`if (textconv_one)`
			`free(mf1.ptr);`
			`if (textconv_two)`
			`free(mf2.ptr);`
			`diff_free_filespec_data(p->one);`
			`diff_free_filespec_data(p->two);`

			`return ret;`
pickaxe: factor out has_changes Move duplicate if/else construct into its own helper function. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:26:24 +02:00			`}`

git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`static void diffcore_pickaxe_count(struct diff_options *o)`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`{`
diff: pass the entire diff-options to diffcore_pickaxe() That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-31 22:44:39 +02:00			`const char *needle = o->pickaxe;`
			`int opts = o->pickaxe_opts;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`unsigned long len = strlen(needle);`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`regex_t regex, *regexp = NULL;`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`kwset_t kws = NULL;`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`if (opts & DIFF_PICKAXE_REGEX) {`
			`int err;`
			`err = regcomp(&regex, needle, REG_EXTENDED \| REG_NEWLINE);`
			`if (err) {`
			`/* The POSIX.2 people are surely sick */`
			`char errbuf[1024];`
			`regerror(err, &regex, errbuf, 1024);`
			`regfree(&regex);`
diffcore-pickaxe: make error messages more consistent Currently, diffcore-pickaxe reports two distinct errors for the same user error: $ git log --pickaxe-regex -S'\1' fatal: invalid pickaxe regex: Invalid back reference $ git log -G'\1' fatal: invalid log-grep regex: Invalid back reference This "log-grep" was only an internal name for the -G feature during development, and invite confusion with "git log --grep=<pattern>". Change the error messages to say "invalid regex". Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-05-31 14:12:14 +02:00			`die("invalid regex: %s", errbuf);`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`}`
			`regexp = &regex;`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`} else {`
pickaxe: allow -i to search in patch case-insensitively "git log -S<string>" is a useful way to find the last commit in the codebase that touched the <string>. As it was designed to be used by a porcelain script to dig the history starting from a block of text that appear in the starting commit, it never had to look for anything but an exact match. When used by an end user who wants to look for the last commit that removed a string (e.g. name of a variable) that he vaguely remembers, however, it is useful to support case insensitive match. When given the "--regexp-ignore-case" (or "-i") option, which originally was designed to affect case sensitivity of the search done in the commit log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is done case insensitively now. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-21 10:02:46 +01:00			`kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)`
			`? tolower_trans_tbl : NULL);`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`kwsincr(kws, needle, len);`
			`kwsprep(kws);`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`}`

pickaxe: factor out pickaxe Move the duplicate diff queue loop into its own function that accepts a match function: has_changes() for -S and diff_grep() for -G. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:50:28 +02:00			`pickaxe(&diff_queued_diff, o, regexp, kws, has_changes);`
pickaxe: plug regex/kws leak With -S... --pickaxe-all, free the regex or the kws before returning even if we found a match. Also get rid of the variable has_changes, as we can simply break out of the loop. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-10-06 18:23:11 +02:00
diffcore-pickaxe.c: remove unnecessary curly braces Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-05 00:51:47 +02:00			`if (opts & DIFF_PICKAXE_REGEX)`
Support for pickaxe matching regular expressions git-diff-* --pickaxe-regex will change the -S pickaxe to match POSIX extended regular expressions instead of fixed strings. The regex.h library is a rather stupid interface and I like pcre too, but with any luck it will be everywhere we will want to run Git on, it being POSIX.2 and all. I'm not sure if we can expect platforms like AIX to conform to POSIX.2 or if win32 has regex.h. We might add a flag to Makefile if there is a portability trouble potential. Signed-off-by: Petr Baudis <pasky@suse.cz> 2006-03-29 02:16:33 +02:00			`regfree(&regex);`
Use kwset in pickaxe Benchmarks in the hot cache case: before: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 47,092,744 cache-misses # 2.825 M/sec ( +- 1.607% ) 123,368,389 cache-references # 7.400 M/sec ( +- 0.812% ) 330,040,998 branch-misses # 3.134 % ( +- 0.257% ) 10,530,896,750 branches # 631.663 M/sec ( +- 0.121% ) 62,037,201,030 instructions # 1.399 IPC ( +- 0.142% ) 44,331,294,321 cycles # 2659.073 M/sec ( +- 0.326% ) 96,794 page-faults # 0.006 M/sec ( +- 11.952% ) 25 CPU-migrations # 0.000 M/sec ( +- 25.266% ) 1,424 context-switches # 0.000 M/sec ( +- 0.540% ) 16671.708650 task-clock-msecs # 0.997 CPUs ( +- 0.343% ) 16.728692052 seconds time elapsed ( +- 0.344% ) after: $ perf stat --repeat=5 git log -Sqwerty Performance counter stats for 'git log -Sqwerty' (5 runs): 51,385,522 cache-misses # 4.619 M/sec ( +- 0.565% ) 129,177,880 cache-references # 11.611 M/sec ( +- 0.219% ) 319,222,775 branch-misses # 6.946 % ( +- 0.134% ) 4,595,913,233 branches # 413.086 M/sec ( +- 0.112% ) 31,395,042,533 instructions # 1.062 IPC ( +- 0.129% ) 29,558,348,598 cycles # 2656.740 M/sec ( +- 0.204% ) 93,224 page-faults # 0.008 M/sec ( +- 4.487% ) 19 CPU-migrations # 0.000 M/sec ( +- 10.425% ) 950 context-switches # 0.000 M/sec ( +- 0.360% ) 11125.796039 task-clock-msecs # 0.997 CPUs ( +- 0.239% ) 11.164216599 seconds time elapsed ( +- 0.240% ) So the kwset code is about 33% faster. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-21 00:41:57 +02:00			`else`
			`kwsfree(kws);`
[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-21 11:40:01 +02:00			`return;`
			`}`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00
			`void diffcore_pickaxe(struct diff_options *o)`
			`{`
			`/* Might want to warn when both S and G are on; I don't care... */`
			`if (o->pickaxe_opts & DIFF_PICKAXE_KIND_G)`
diffcore-pickaxe.c: a void function shouldn't try to return something Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-05 00:51:48 +02:00			`diffcore_pickaxe_grep(o);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`else`
diffcore-pickaxe.c: a void function shouldn't try to return something Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-10-05 00:51:48 +02:00			`diffcore_pickaxe_count(o);`
git log/diff: add -G<regexp> that greps in the patch text Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the "git diff" family of commands. This limits the diff queue to filepairs whose patch text actually has an added or a deleted line that matches the given regexp. Unlike "-S<regexp>", changing other parts of the line that has a substring that matches the given regexp IS counted as a change, as such a change would appear as one deletion followed by one addition in a patch text. Unlike -S (pickaxe) that is intended to be used to quickly detect a commit that changes the number of occurrences of hits between the preimage and the postimage to serve as a part of larger toolchain, this is meant to be used as the top-level Porcelain feature. The implementation unfortunately has to run "diff" twice if you are running "log" family of commands to produce patches in the final output (e.g. "git log -p" or "git format-patch"). I think we _could_ cache the result in-core if we wanted to, but that would require larger surgery to the diffcore machinery (i.e. adding an extra pointer in the filepair structure to keep a pointer to a strbuf around, stuff the textual diff to the strbuf inside diffgrep_consume(), and make use of it in later stages when it is available) and it may not be worth it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-23 19:17:03 +02:00			`}`