mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-09 02:33:11 +01:00

1257 lines

31 KiB

C

Raw Normal View History

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`#include "git-compat-util.h"`
			`#include "line-range.h"`
			`#include "cache.h"`
			`#include "tag.h"`
			`#include "blob.h"`
			`#include "tree.h"`
			`#include "diff.h"`
			`#include "commit.h"`
			`#include "decorate.h"`
			`#include "revision.h"`
			`#include "xdiff-interface.h"`
			`#include "strbuf.h"`
			`#include "log-tree.h"`
			`#include "graph.h"`
log -L: :pattern:file syntax to find by funcname This new syntax finds a funcname matching /pattern/, and then takes from there up to (but not including) the next funcname. So you can say git log -L:main:main.c and it will dig up the main() function and show its line-log, provided there are no other funcnames matching 'main'. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:33 +01:00			`#include "userdiff.h"`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`#include "line-log.h"`

			`static void range_set_grow(struct range_set *rs, size_t extra)`
			`{`
			`ALLOC_GROW(rs->ranges, rs->nr + extra, rs->alloc);`
			`}`

			`/* Either initialization would be fine */`
			`#define RANGE_SET_INIT {0}`

range-set: publish API for re-use by git-blame -L git-blame is slated to accept multiple -L ranges. git-log already accepts multiple -L's but its implementation of range-set, which organizes and normalizes -L ranges, is private. Publish the small subset of range-set API which is needed for git-blame multiple -L support. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:36 +02:00			`void range_set_init(struct range_set *rs, size_t prealloc)`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`{`
			`rs->alloc = rs->nr = 0;`
			`rs->ranges = NULL;`
			`if (prealloc)`
			`range_set_grow(rs, prealloc);`
			`}`

range-set: publish API for re-use by git-blame -L git-blame is slated to accept multiple -L ranges. git-log already accepts multiple -L's but its implementation of range-set, which organizes and normalizes -L ranges, is private. Publish the small subset of range-set API which is needed for git-blame multiple -L support. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:36 +02:00			`void range_set_release(struct range_set *rs)`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`{`
			`free(rs->ranges);`
			`rs->alloc = rs->nr = 0;`
			`rs->ranges = NULL;`
			`}`

			`/* dst must be uninitialized! */`
			`static void range_set_copy(struct range_set dst, struct range_set src)`
			`{`
			`range_set_init(dst, src->nr);`
			`memcpy(dst->ranges, src->ranges, src->nr*sizeof(struct range_set));`
			`dst->nr = src->nr;`
			`}`
			`static void range_set_move(struct range_set dst, struct range_set src)`
			`{`
			`range_set_release(dst);`
			`dst->ranges = src->ranges;`
			`dst->nr = src->nr;`
			`dst->alloc = src->alloc;`
			`src->ranges = NULL;`
			`src->alloc = src->nr = 0;`
			`}`

			`/* tack on a _new_ range _at the end_ */`
range-set: publish API for re-use by git-blame -L git-blame is slated to accept multiple -L ranges. git-log already accepts multiple -L's but its implementation of range-set, which organizes and normalizes -L ranges, is private. Publish the small subset of range-set API which is needed for git-blame multiple -L support. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:36 +02:00			`void range_set_append_unsafe(struct range_set *rs, long a, long b)`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`{`
			`assert(a <= b);`
			`range_set_grow(rs, 1);`
			`rs->ranges[rs->nr].start = a;`
			`rs->ranges[rs->nr].end = b;`
			`rs->nr++;`
			`}`

range-set: publish API for re-use by git-blame -L git-blame is slated to accept multiple -L ranges. git-log already accepts multiple -L's but its implementation of range-set, which organizes and normalizes -L ranges, is private. Publish the small subset of range-set API which is needed for git-blame multiple -L support. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:36 +02:00			`void range_set_append(struct range_set *rs, long a, long b)`
log -L: fix overlapping input ranges The existing code was too defensive, and would trigger the assert in range_set_append() if the user gave overlapping ranges. The intent was always to define overlapping ranges as just the union of all of them, as evidenced by the call to sort_and_merge_range_set(). (Which was already used, unlike what the comment said.) Fix by splitting out the meat of range_set_append() to a new _unsafe() function that lacks the paranoia. sort_and_merge_range_set will fix up the ranges, so we don't need the checks there. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:48 +02:00			`{`
			`assert(rs->nr == 0 \|\| rs->ranges[rs->nr-1].end <= a);`
			`range_set_append_unsafe(rs, a, b);`
			`}`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`static int range_cmp(const void _r, const void _s)`
			`{`
			`const struct range *r = _r;`
			`const struct range *s = _s;`

			`/* this could be simply 'return r.start-s.start', but for the types */`
			`if (r->start == s->start)`
			`return 0;`
			`if (r->start < s->start)`
			`return -1;`
			`return 1;`
			`}`

log -L: check range set invariants when we look it up lookup_line_range() is a good place to check that the range sets satisfy the invariants: they have been computed and set in earlier iterations, and we now start working with them. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:47 +02:00			`/*`
			`* Check that the ranges are non-empty, sorted and non-overlapping`
			`*/`
			`static void range_set_check_invariants(struct range_set *rs)`
			`{`
			`int i;`

			`if (!rs)`
			`return;`

			`if (rs->nr)`
			`assert(rs->ranges[0].start < rs->ranges[0].end);`

			`for (i = 1; i < rs->nr; i++) {`
			`assert(rs->ranges[i-1].end < rs->ranges[i].start);`
			`assert(rs->ranges[i].start < rs->ranges[i].end);`
			`}`
			`}`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`/*`
log -L: fix overlapping input ranges The existing code was too defensive, and would trigger the assert in range_set_append() if the user gave overlapping ranges. The intent was always to define overlapping ranges as just the union of all of them, as evidenced by the call to sort_and_merge_range_set(). (Which was already used, unlike what the comment said.) Fix by splitting out the meat of range_set_append() to a new _unsafe() function that lacks the paranoia. sort_and_merge_range_set will fix up the ranges, so we don't need the checks there. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:48 +02:00			`* In-place pass of sorting and merging the ranges in the range set,`
			`* to establish the invariants when we get the ranges from the user`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`*/`
range-set: publish API for re-use by git-blame -L git-blame is slated to accept multiple -L ranges. git-log already accepts multiple -L's but its implementation of range-set, which organizes and normalizes -L ranges, is private. Publish the small subset of range-set API which is needed for git-blame multiple -L support. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:36 +02:00			`void sort_and_merge_range_set(struct range_set *rs)`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`{`
			`int i;`
range-set: fix sort_and_merge_range_set() corner case bug When handed an empty range_set (range_set.nr == 0), sort_and_merge_range_set() incorrectly sets range_set.nr to 1 at exit. Subsequent range_set functions then access the bogus range at element zero and crash or throw an assertion failure. Fix this bug. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Acked-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-23 16:28:04 +02:00			`int o = 0; /* output cursor */`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`qsort(rs->ranges, rs->nr, sizeof(struct range), range_cmp);`

range-set: fix sort_and_merge_range_set() corner case bug When handed an empty range_set (range_set.nr == 0), sort_and_merge_range_set() incorrectly sets range_set.nr to 1 at exit. Subsequent range_set functions then access the bogus range at element zero and crash or throw an assertion failure. Fix this bug. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Acked-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-23 16:28:04 +02:00			`for (i = 0; i < rs->nr; i++) {`
range-set: satisfy non-empty ranges invariant range-set invariants are: ranges must be (1) non-empty, (2) disjoint, (3) sorted in ascending order. During processing, various range-set utility functions break the invariants (for instance, by adding empty ranges), with the expectation that a finalizing sort_and_merge_range_set() will restore sanity. sort_and_merge_range_set(), however, neglects to fold out empty ranges, thus it fails to satisfy the non-empty constraint. Subsequent range-set functions crash or throw an assertion failure upon encountering such an anomaly. Rectify the situation by having sort_and_merge_range_set() fold out empty ranges. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Acked-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-23 16:28:06 +02:00			`if (rs->ranges[i].start == rs->ranges[i].end)`
			`continue;`
range-set: fix sort_and_merge_range_set() corner case bug When handed an empty range_set (range_set.nr == 0), sort_and_merge_range_set() incorrectly sets range_set.nr to 1 at exit. Subsequent range_set functions then access the bogus range at element zero and crash or throw an assertion failure. Fix this bug. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Acked-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-23 16:28:04 +02:00			`if (o > 0 && rs->ranges[i].start <= rs->ranges[o-1].end) {`
range_set: fix coalescing bug when range is a subset of another When coalescing ranges, sort_and_merge_range_set() unconditionally assumes that the end of a range being folded into a preceding range should become the end of the coalesced range. This assumption, however, is invalid when one range is a subset of another. For example, given ranges 1-5 and 2-3 added via range_set_append_unsafe(), sort_and_merge_range_set() incorrectly coalesces them to range 1-3 rather than the correct union range 1-5. Fix this bug. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-09 07:55:05 +02:00			`if (rs->ranges[o-1].end < rs->ranges[i].end)`
			`rs->ranges[o-1].end = rs->ranges[i].end;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`} else {`
			`rs->ranges[o].start = rs->ranges[i].start;`
			`rs->ranges[o].end = rs->ranges[i].end;`
			`o++;`
			`}`
			`}`
			`assert(o <= rs->nr);`
			`rs->nr = o;`
log -L: check range set invariants when we look it up lookup_line_range() is a good place to check that the range sets satisfy the invariants: they have been computed and set in earlier iterations, and we now start working with them. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:47 +02:00
			`range_set_check_invariants(rs);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`}`

			`/*`
			`* Union of range sets (i.e., sets of line numbers). Used to merge`
			`* them when searches meet at a common ancestor.`
			`*`
			`* This is also where the ranges are consolidated into canonical form:`
			`* overlapping and adjacent ranges are merged, and empty ranges are`
			`* removed.`
			`*/`
			`static void range_set_union(struct range_set *out,`
			`struct range_set a, struct range_set b)`
			`{`
			`int i = 0, j = 0, o = 0;`
			`struct range *ra = a->ranges;`
			`struct range *rb = b->ranges;`
			`/* cannot make an alias of out->ranges: it may change during grow */`

			`assert(out->nr == 0);`
			`while (i < a->nr \|\| j < b->nr) {`
			`struct range *new;`
			`if (i < a->nr && j < b->nr) {`
			`if (ra[i].start < rb[j].start)`
			`new = &ra[i++];`
			`else if (ra[i].start > rb[j].start)`
			`new = &rb[j++];`
			`else if (ra[i].end < rb[j].end)`
			`new = &ra[i++];`
			`else`
			`new = &rb[j++];`
			`} else if (i < a->nr) /* b exhausted */`
			`new = &ra[i++];`
			`else /* a exhausted */`
			`new = &rb[j++];`
			`if (new->start == new->end)`
			`; /* empty range */`
			`else if (!o \|\| out->ranges[o-1].end < new->start) {`
			`range_set_grow(out, 1);`
			`out->ranges[o].start = new->start;`
			`out->ranges[o].end = new->end;`
			`o++;`
			`} else if (out->ranges[o-1].end < new->end) {`
			`out->ranges[o-1].end = new->end;`
			`}`
			`}`
			`out->nr = o;`
			`}`

			`/*`
			`* Difference of range sets (out = a \ b). Pass the "interesting"`
			`* ranges as 'a' and the target side of the diff as 'b': it removes`
			`* the ranges for which the commit is responsible.`
			`*/`
			`static void range_set_difference(struct range_set *out,`
			`struct range_set a, struct range_set b)`
			`{`
			`int i, j = 0;`
			`for (i = 0; i < a->nr; i++) {`
			`long start = a->ranges[i].start;`
			`long end = a->ranges[i].end;`
			`while (start < end) {`
			`while (j < b->nr && start >= b->ranges[j].end)`
			`/*`
			`* a: \|-------`
			`* b: ------\|`
			`*/`
			`j++;`
			`if (j >= b->nr \|\| end < b->ranges[j].start) {`
			`/*`
			`* b exhausted, or`
			`* a: ----\|`
			`* b: \|----`
			`*/`
			`range_set_append(out, start, end);`
			`break;`
			`}`
			`if (start >= b->ranges[j].start) {`
			`/*`
			`* a: \|--????`
			`* b: \|------\|`
			`*/`
			`start = b->ranges[j].end;`
			`} else if (end > b->ranges[j].start) {`
			`/*`
			`* a: \|-----\|`
			`* b: \|--?????`
			`*/`
			`if (start < b->ranges[j].start)`
			`range_set_append(out, start, b->ranges[j].start);`
			`start = b->ranges[j].end;`
			`}`
			`}`
			`}`
			`}`

			`static void diff_ranges_init(struct diff_ranges *diff)`
			`{`
			`range_set_init(&diff->parent, 0);`
			`range_set_init(&diff->target, 0);`
			`}`

			`static void diff_ranges_release(struct diff_ranges *diff)`
			`{`
			`range_set_release(&diff->parent);`
			`range_set_release(&diff->target);`
			`}`

line-log.c: make line_log_data_init() static No external callers exist. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-01-14 23:39:02 +01:00			`static void line_log_data_init(struct line_log_data *r)`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`{`
			`memset(r, 0, sizeof(struct line_log_data));`
			`range_set_init(&r->ranges, 0);`
			`}`

			`static void line_log_data_clear(struct line_log_data *r)`
			`{`
			`range_set_release(&r->ranges);`
			`if (r->pair)`
			`diff_free_filepair(r->pair);`
			`}`

			`static void free_line_log_data(struct line_log_data *r)`
			`{`
			`while (r) {`
			`struct line_log_data *next = r->next;`
			`line_log_data_clear(r);`
			`free(r);`
			`r = next;`
			`}`
			`}`

			`static struct line_log_data *`
			`search_line_log_data(struct line_log_data list, const char path,`
			`struct line_log_data **insertion_point)`
			`{`
			`struct line_log_data *p = list;`
			`if (insertion_point)`
			`*insertion_point = NULL;`
			`while (p) {`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`int cmp = strcmp(p->path, path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`if (!cmp)`
			`return p;`
			`if (insertion_point && cmp < 0)`
			`*insertion_point = p;`
			`p = p->next;`
			`}`
			`return NULL;`
			`}`

log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`/*`
			`* Note: takes ownership of 'path', which happens to be what the only`
			`* caller needs.`
			`*/`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`static void line_log_data_insert(struct line_log_data **list,`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`char *path,`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`long begin, long end)`
			`{`
			`struct line_log_data *ip;`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`struct line_log_data p = search_line_log_data(list, path, &ip);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`if (p) {`
log -L: fix overlapping input ranges The existing code was too defensive, and would trigger the assert in range_set_append() if the user gave overlapping ranges. The intent was always to define overlapping ranges as just the union of all of them, as evidenced by the call to sort_and_merge_range_set(). (Which was already used, unlike what the comment said.) Fix by splitting out the meat of range_set_append() to a new _unsafe() function that lacks the paranoia. sort_and_merge_range_set will fix up the ranges, so we don't need the checks there. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:48 +02:00			`range_set_append_unsafe(&p->ranges, begin, end);`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`free(path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`return;`
			`}`

			`p = xcalloc(1, sizeof(struct line_log_data));`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`p->path = path;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`range_set_append(&p->ranges, begin, end);`
			`if (ip) {`
			`p->next = ip->next;`
			`ip->next = p;`
			`} else {`
			`p->next = *list;`
			`*list = p;`
			`}`
			`}`

			`struct collect_diff_cbdata {`
			`struct diff_ranges *diff;`
			`};`

			`static int collect_diff_cb(long start_a, long count_a,`
			`long start_b, long count_b,`
			`void *data)`
			`{`
			`struct collect_diff_cbdata *d = data;`

			`if (count_a >= 0)`
			`range_set_append(&d->diff->parent, start_a, start_a + count_a);`
			`if (count_b >= 0)`
			`range_set_append(&d->diff->target, start_b, start_b + count_b);`

			`return 0;`
			`}`

			`static void collect_diff(mmfile_t parent, mmfile_t target, struct diff_ranges *out)`
			`{`
			`struct collect_diff_cbdata cbdata = {NULL};`
			`xpparam_t xpp;`
			`xdemitconf_t xecfg;`
			`xdemitcb_t ecb;`

			`memset(&xpp, 0, sizeof(xpp));`
			`memset(&xecfg, 0, sizeof(xecfg));`
			`xecfg.ctxlen = xecfg.interhunkctxlen = 0;`

			`cbdata.diff = out;`
			`xecfg.hunk_func = collect_diff_cb;`
			`memset(&ecb, 0, sizeof(ecb));`
			`ecb.priv = &cbdata;`
			`xdi_diff(parent, target, &xpp, &xecfg, &ecb);`
			`}`

			`/*`
			`* These are handy for debugging. Removing them with #if 0 silences`
			`* the "unused function" warning.`
			`*/`
			`#if 0`
			`static void dump_range_set(struct range_set rs, const char desc)`
			`{`
			`int i;`
			`printf("range set %s (%d items):\n", desc, rs->nr);`
			`for (i = 0; i < rs->nr; i++)`
			`printf("\t[%ld,%ld]\n", rs->ranges[i].start, rs->ranges[i].end);`
			`}`

			`static void dump_line_log_data(struct line_log_data *r)`
			`{`
			`char buf[4096];`
			`while (r) {`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`snprintf(buf, 4096, "file %s\n", r->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`dump_range_set(&r->ranges, buf);`
			`r = r->next;`
			`}`
			`}`

			`static void dump_diff_ranges(struct diff_ranges diff, const char desc)`
			`{`
			`int i;`
			`assert(diff->parent.nr == diff->target.nr);`
			`printf("diff ranges %s (%d items):\n", desc, diff->parent.nr);`
			`printf("\tparent\ttarget\n");`
			`for (i = 0; i < diff->parent.nr; i++) {`
			`printf("\t[%ld,%ld]\t[%ld,%ld]\n",`
			`diff->parent.ranges[i].start,`
			`diff->parent.ranges[i].end,`
			`diff->target.ranges[i].start,`
			`diff->target.ranges[i].end);`
			`}`
			`}`
			`#endif`


			`static int ranges_overlap(struct range a, struct range b)`
			`{`
			`return !(a->end <= b->start \|\| b->end <= a->start);`
			`}`

			`/*`
			`* Given a diff and the set of interesting ranges, determine all hunks`
			`* of the diff which touch (overlap) at least one of the interesting`
			`* ranges in the target.`
			`*/`
			`static void diff_ranges_filter_touched(struct diff_ranges *out,`
			`struct diff_ranges *diff,`
			`struct range_set *rs)`
			`{`
			`int i, j = 0;`

			`assert(out->target.nr == 0);`

			`for (i = 0; i < diff->target.nr; i++) {`
			`while (diff->target.ranges[i].start > rs->ranges[j].end) {`
			`j++;`
			`if (j == rs->nr)`
			`return;`
			`}`
			`if (ranges_overlap(&diff->target.ranges[i], &rs->ranges[j])) {`
			`range_set_append(&out->parent,`
			`diff->parent.ranges[i].start,`
			`diff->parent.ranges[i].end);`
			`range_set_append(&out->target,`
			`diff->target.ranges[i].start,`
			`diff->target.ranges[i].end);`
			`}`
			`}`
			`}`

			`/*`
			`* Adjust the line counts in 'rs' to account for the lines`
			`* added/removed in the diff.`
			`*/`
			`static void range_set_shift_diff(struct range_set *out,`
			`struct range_set *rs,`
			`struct diff_ranges *diff)`
			`{`
			`int i, j = 0;`
			`long offset = 0;`
			`struct range *src = rs->ranges;`
			`struct range *target = diff->target.ranges;`
			`struct range *parent = diff->parent.ranges;`

			`for (i = 0; i < rs->nr; i++) {`
			`while (j < diff->target.nr && src[i].start >= target[j].start) {`
			`offset += (parent[j].end-parent[j].start)`
			`- (target[j].end-target[j].start);`
			`j++;`
			`}`
			`range_set_append(out, src[i].start+offset, src[i].end+offset);`
			`}`
			`}`

			`/*`
			`* Given a diff and the set of interesting ranges, map the ranges`
			`* across the diff. That is: observe that the target commit takes`
			`* blame for all the + (target-side) ranges. So for every pair of`
			`* ranges in the diff that was touched, we remove the latter and add`
			`* its parent side.`
			`*/`
			`static void range_set_map_across_diff(struct range_set *out,`
			`struct range_set *rs,`
			`struct diff_ranges *diff,`
			`struct diff_ranges **touched_out)`
			`{`
			`struct diff_ranges touched = xmalloc(sizeof(touched));`
			`struct range_set tmp1 = RANGE_SET_INIT;`
			`struct range_set tmp2 = RANGE_SET_INIT;`

			`diff_ranges_init(touched);`
			`diff_ranges_filter_touched(touched, diff, rs);`
			`range_set_difference(&tmp1, rs, &touched->target);`
			`range_set_shift_diff(&tmp2, &tmp1, diff);`
			`range_set_union(out, &tmp2, &touched->parent);`
			`range_set_release(&tmp1);`
			`range_set_release(&tmp2);`

			`*touched_out = touched;`
			`}`

			`static struct commit check_single_commit(struct rev_info revs)`
			`{`
			`struct object *commit = NULL;`
			`int found = -1;`
			`int i;`

			`for (i = 0; i < revs->pending.nr; i++) {`
			`struct object *obj = revs->pending.objects[i].item;`
			`if (obj->flags & UNINTERESTING)`
			`continue;`
			`while (obj->type == OBJ_TAG)`
			`obj = deref_tag(obj, NULL, 0);`
			`if (obj->type != OBJ_COMMIT)`
			`die("Non commit %s?", revs->pending.objects[i].name);`
			`if (commit)`
			`die("More than one commit to dig from: %s and %s?",`
			`revs->pending.objects[i].name,`
			`revs->pending.objects[found].name);`
			`commit = obj;`
			`found = i;`
			`}`

			`if (!commit)`
			`die("No commit specified?");`

			`return (struct commit *) commit;`
			`}`

			`static void fill_blob_sha1(struct commit commit, struct diff_filespec spec)`
			`{`
			`unsigned mode;`
			`unsigned char sha1[20];`

			`if (get_tree_entry(commit->object.sha1, spec->path,`
			`sha1, &mode))`
			`die("There is no path %s in the commit", spec->path);`
			`fill_filespec(spec, sha1, 1, mode);`

			`return;`
			`}`

			`static void fill_line_ends(struct diff_filespec spec, long lines,`
			`unsigned long **line_ends)`
			`{`
			`int num = 0, size = 50;`
			`long cur = 0;`
			`unsigned long *ends = NULL;`
			`char *data = NULL;`

			`if (diff_populate_filespec(spec, 0))`
			`die("Cannot read blob %s", sha1_to_hex(spec->sha1));`

			`ends = xmalloc(size * sizeof(*ends));`
			`ends[cur++] = 0;`
			`data = spec->data;`
			`while (num < spec->size) {`
			`if (data[num] == '\n' \|\| num == spec->size - 1) {`
			`ALLOC_GROW(ends, (cur + 1), size);`
			`ends[cur++] = num;`
			`}`
			`num++;`
			`}`

			`/* shrink the array to fit the elements */`
use REALLOC_ARRAY for changing the allocation size of arrays Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-09-16 20:56:57 +02:00			`REALLOC_ARRAY(ends, cur);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`*lines = cur-1;`
			`*line_ends = ends;`
			`}`

			`struct nth_line_cb {`
			`struct diff_filespec *spec;`
			`long lines;`
			`unsigned long *line_ends;`
			`};`

			`static const char nth_line(void data, long line)`
			`{`
			`struct nth_line_cb *d = data;`
			`assert(d && line <= d->lines);`
			`assert(d->spec && d->spec->data);`

			`if (line == 0)`
			`return (char *)d->spec->data;`
			`else`
			`return (char *)d->spec->data + d->line_ends[line] + 1;`
			`}`

			`static struct line_log_data *`
			`parse_lines(struct commit commit, const char prefix, struct string_list *args)`
			`{`
			`long lines = 0;`
			`unsigned long *ends = NULL;`
			`struct nth_line_cb cb_data;`
			`struct string_list_item *item;`
			`struct line_log_data *ranges = NULL;`
log: teach -L/RE/ to search from end of previous -L range This is complicated slightly by having to remember the previous -L range for each file specified via -L<range>:file. The existing implementation coalesces ranges for each file as each -L is parsed which makes it impossible to refer back to the previous -L range for any particular file. Re-implement to instead store each file's set of -L ranges verbatim, and then coalesce the ranges in a post-processing step. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:43 +02:00			`struct line_log_data *p;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`for_each_string_list_item(item, args) {`
			`const char name_part, range_part;`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`char *full_name;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`struct diff_filespec *spec;`
			`long begin = 0, end = 0;`
log: teach -L/RE/ to search from end of previous -L range This is complicated slightly by having to remember the previous -L range for each file specified via -L<range>:file. The existing implementation coalesces ranges for each file as each -L is parsed which makes it impossible to refer back to the previous -L range for any particular file. Re-implement to instead store each file's set of -L ranges verbatim, and then coalesce the ranges in a post-processing step. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:43 +02:00			`long anchor;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`name_part = skip_range_arg(item->string);`
			`if (!name_part \|\| *name_part != ':' \|\| !name_part[1])`
log -L: improve error message on malformed argument The old message did not mention the :regex:file form. To avoid overly long lines, split the message into two lines (in case item->string is long, it will be the only part truncated in a narrow terminal). Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-04-20 14:09:07 +02:00			`die("-L argument not 'start,end:file' or ':funcname:file': %s",`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`item->string);`
			`range_part = xstrndup(item->string, name_part - item->string);`
			`name_part++;`

			`full_name = prefix_path(prefix, prefix ? strlen(prefix) : 0,`
			`name_part);`

			`spec = alloc_filespec(full_name);`
			`fill_blob_sha1(commit, spec);`
			`fill_line_ends(spec, &lines, &ends);`
			`cb_data.spec = spec;`
			`cb_data.lines = lines;`
			`cb_data.line_ends = ends;`

log: teach -L/RE/ to search from end of previous -L range This is complicated slightly by having to remember the previous -L range for each file specified via -L<range>:file. The existing implementation coalesces ranges for each file as each -L is parsed which makes it impossible to refer back to the previous -L range for any particular file. Re-implement to instead store each file's set of -L ranges verbatim, and then coalesce the ranges in a post-processing step. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:43 +02:00			`p = search_line_log_data(ranges, full_name, NULL);`
			`if (p && p->ranges.nr)`
			`anchor = p->ranges.ranges[p->ranges.nr - 1].end + 1;`
			`else`
			`anchor = 1;`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`if (parse_range_arg(range_part, nth_line, &cb_data,`
log: teach -L/RE/ to search from end of previous -L range This is complicated slightly by having to remember the previous -L range for each file specified via -L<range>:file. The existing implementation coalesces ranges for each file as each -L is parsed which makes it impossible to refer back to the previous -L range for any particular file. Re-implement to instead store each file's set of -L ranges verbatim, and then coalesce the ranges in a post-processing step. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:43 +02:00			`lines, anchor, &begin, &end,`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`full_name))`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`die("malformed -L argument '%s'", range_part);`
log: fix -L bounds checking bug When 12da1d1f added -L support to git-log, a broken bounds check was copied from git-blame -L which incorrectly allows -LX to extend one line past end of file without reporting an error. Instead, it generates an empty range. Fix this bug. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-31 10:15:41 +02:00			`if (lines < end \|\| ((lines \|\| begin) && lines < begin))`
			`die("file %s has only %lu lines", name_part, lines);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`if (begin < 1)`
			`begin = 1;`
			`if (end < 1)`
			`end = lines;`
			`begin--;`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`line_log_data_insert(&ranges, full_name, begin, end);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`free_filespec(spec);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`free(ends);`
			`ends = NULL;`
			`}`

log: teach -L/RE/ to search from end of previous -L range This is complicated slightly by having to remember the previous -L range for each file specified via -L<range>:file. The existing implementation coalesces ranges for each file as each -L is parsed which makes it impossible to refer back to the previous -L range for any particular file. Re-implement to instead store each file's set of -L ranges verbatim, and then coalesce the ranges in a post-processing step. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-06 15:59:43 +02:00			`for (p = ranges; p; p = p->next)`
			`sort_and_merge_range_set(&p->ranges);`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`return ranges;`
			`}`

			`static struct line_log_data line_log_data_copy_one(struct line_log_data r)`
			`{`
			`struct line_log_data ret = xmalloc(sizeof(ret));`

			`assert(r);`
			`line_log_data_init(ret);`
			`range_set_copy(&ret->ranges, &r->ranges);`

log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`ret->path = xstrdup(r->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`return ret;`
			`}`

			`static struct line_log_data *`
			`line_log_data_copy(struct line_log_data *r)`
			`{`
			`struct line_log_data *ret = NULL;`
			`struct line_log_data tmp = NULL, prev = NULL;`

			`assert(r);`
			`ret = tmp = prev = line_log_data_copy_one(r);`
			`r = r->next;`
			`while (r) {`
			`tmp = line_log_data_copy_one(r);`
			`prev->next = tmp;`
			`prev = tmp;`
			`r = r->next;`
			`}`

			`return ret;`
			`}`

			`/* merge two range sets across files */`
			`static struct line_log_data line_log_data_merge(struct line_log_data a,`
			`struct line_log_data *b)`
			`{`
			`struct line_log_data head = NULL, *pp = &head;`

			`while (a \|\| b) {`
			`struct line_log_data *src;`
			`struct line_log_data *src2 = NULL;`
			`struct line_log_data *d;`
			`int cmp;`
			`if (!a)`
			`cmp = 1;`
			`else if (!b)`
			`cmp = -1;`
			`else`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`cmp = strcmp(a->path, b->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`if (cmp < 0) {`
			`src = a;`
			`a = a->next;`
			`} else if (cmp == 0) {`
			`src = a;`
			`a = a->next;`
			`src2 = b;`
			`b = b->next;`
			`} else {`
			`src = b;`
			`b = b->next;`
			`}`
			`d = xmalloc(sizeof(struct line_log_data));`
			`line_log_data_init(d);`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`d->path = xstrdup(src->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`*pp = d;`
			`pp = &d->next;`
			`if (src2)`
			`range_set_union(&d->ranges, &src->ranges, &src2->ranges);`
			`else`
			`range_set_copy(&d->ranges, &src->ranges);`
			`}`

			`return head;`
			`}`

			`static void add_line_range(struct rev_info revs, struct commit commit,`
			`struct line_log_data *range)`
			`{`
			`struct line_log_data *old = NULL;`
			`struct line_log_data *new = NULL;`

			`old = lookup_decoration(&revs->line_log_data, &commit->object);`
			`if (old && range) {`
			`new = line_log_data_merge(old, range);`
			`free_line_log_data(old);`
			`} else if (range)`
			`new = line_log_data_copy(range);`

			`if (new)`
			`add_decoration(&revs->line_log_data, &commit->object, new);`
			`}`

			`static void clear_commit_line_range(struct rev_info revs, struct commit commit)`
			`{`
			`struct line_log_data *r;`
			`r = lookup_decoration(&revs->line_log_data, &commit->object);`
			`if (!r)`
			`return;`
			`free_line_log_data(r);`
			`add_decoration(&revs->line_log_data, &commit->object, NULL);`
			`}`

			`static struct line_log_data lookup_line_range(struct rev_info revs,`
			`struct commit *commit)`
			`{`
			`struct line_log_data *ret = NULL;`
log -L: check range set invariants when we look it up lookup_line_range() is a good place to check that the range sets satisfy the invariants: they have been computed and set in earlier iterations, and we now start working with them. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:47 +02:00			`struct line_log_data *d;`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`ret = lookup_decoration(&revs->line_log_data, &commit->object);`
log -L: check range set invariants when we look it up lookup_line_range() is a good place to check that the range sets satisfy the invariants: they have been computed and set in earlier iterations, and we now start working with them. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-05 16:34:47 +02:00
			`for (d = ret; d; d = d->next)`
			`range_set_check_invariants(&d->ranges);`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`return ret;`
			`}`

			`void line_log_init(struct rev_info rev, const char prefix, struct string_list *args)`
			`{`
			`struct commit *commit = NULL;`
			`struct line_log_data *range;`

			`commit = check_single_commit(rev);`
			`range = parse_lines(commit, prefix, args);`
			`add_line_range(rev, commit, range);`

			`if (!rev->diffopt.detect_rename) {`
			`int i, count = 0;`
			`struct line_log_data *r = range;`
			`const char **paths;`
			`while (r) {`
			`count++;`
			`r = r->next;`
			`}`
			`paths = xmalloc((count+1)sizeof(char ));`
			`r = range;`
			`for (i = 0; i < count; i++) {`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`paths[i] = xstrdup(r->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`r = r->next;`
			`}`
			`paths[count] = NULL;`
Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags When parse_pathspec() is called with no paths, the behavior could be either return no paths, or return one path that is cwd. Some commands do the former, some the latter. parse_pathspec() itself does not make either the default and requires the caller to specify either flag if it may run into this situation. I've grep'd through all parse_pathspec() call sites. Some pass neither, but those are guaranteed never pass empty path to parse_pathspec(). There are two call sites that may pass empty path and are fixed with this patch. [jc: added a test from Antoine's bug report] Reported-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-10-19 04:41:24 +02:00			`parse_pathspec(&rev->diffopt.pathspec, 0,`
			`PATHSPEC_PREFER_FULL, "", paths);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`free(paths);`
			`}`
			`}`

			`static void move_diff_queue(struct diff_queue_struct *dst,`
			`struct diff_queue_struct *src)`
			`{`
			`assert(src != dst);`
			`memcpy(dst, src, sizeof(struct diff_queue_struct));`
			`DIFF_QUEUE_CLEAR(src);`
			`}`

Speed up log -L... -M So far log -L only used the implicit diff filtering by pathspec. If the user specifies -M, we cannot do that, and so we simply handed the whole diff queue (which is approximately 'git show --raw') to diffcore_std(). Unfortunately this is very slow. We can optimize a lot if we throw out files that we know cannot possibly be interesting, in the same spirit that the pathspec filtering reduces the number of files. However, in this case, we have to be more careful. Because we want to look out for renames, we need to keep all filepairs where something was deleted. This is a bit hacky and should really be replaced by equivalent support in --follow, and just using that. However, in the meantime it speeds up 'log -M -L' by an order of magnitude. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:34 +01:00			`static void filter_diffs_for_paths(struct line_log_data *range, int keep_deletions)`
			`{`
			`int i;`
			`struct diff_queue_struct outq;`
			`DIFF_QUEUE_CLEAR(&outq);`

			`for (i = 0; i < diff_queued_diff.nr; i++) {`
			`struct diff_filepair *p = diff_queued_diff.queue[i];`
			`struct line_log_data *rg = NULL;`

			`if (!DIFF_FILE_VALID(p->two)) {`
			`if (keep_deletions)`
			`diff_q(&outq, p);`
			`else`
			`diff_free_filepair(p);`
			`continue;`
			`}`
			`for (rg = range; rg; rg = rg->next) {`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`if (!strcmp(rg->path, p->two->path))`
Speed up log -L... -M So far log -L only used the implicit diff filtering by pathspec. If the user specifies -M, we cannot do that, and so we simply handed the whole diff queue (which is approximately 'git show --raw') to diffcore_std(). Unfortunately this is very slow. We can optimize a lot if we throw out files that we know cannot possibly be interesting, in the same spirit that the pathspec filtering reduces the number of files. However, in this case, we have to be more careful. Because we want to look out for renames, we need to keep all filepairs where something was deleted. This is a bit hacky and should really be replaced by equivalent support in --follow, and just using that. However, in the meantime it speeds up 'log -M -L' by an order of magnitude. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:34 +01:00			`break;`
			`}`
			`if (rg)`
			`diff_q(&outq, p);`
			`else`
			`diff_free_filepair(p);`
			`}`
			`free(diff_queued_diff.queue);`
			`diff_queued_diff = outq;`
			`}`

			`static inline int diff_might_be_rename(void)`
			`{`
			`int i;`
			`for (i = 0; i < diff_queued_diff.nr; i++)`
			`if (!DIFF_FILE_VALID(diff_queued_diff.queue[i]->one)) {`
			`/* fprintf(stderr, "diff_might_be_rename found creation of: %s\n", */`
			`/* diff_queued_diff.queue[i]->two->path); */`
			`return 1;`
			`}`
			`return 0;`
			`}`

			`static void queue_diffs(struct line_log_data *range,`
			`struct diff_options *opt,`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`struct diff_queue_struct *queue,`
			`struct commit commit, struct commit parent)`
			`{`
			`assert(commit);`

			`DIFF_QUEUE_CLEAR(&diff_queued_diff);`
line-log: convert to using diff_tree_sha1() Since diff_tree_sha1() can now accept empty trees via NULL sha1, we could just call it without manually reading trees into tree_desc and duplicating code. Cc: Thomas Rast <tr@thomasrast.ch> Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-02-05 17:57:11 +01:00			`diff_tree_sha1(parent ? parent->tree->object.sha1 : NULL,`
			`commit->tree->object.sha1, "", opt);`
Speed up log -L... -M So far log -L only used the implicit diff filtering by pathspec. If the user specifies -M, we cannot do that, and so we simply handed the whole diff queue (which is approximately 'git show --raw') to diffcore_std(). Unfortunately this is very slow. We can optimize a lot if we throw out files that we know cannot possibly be interesting, in the same spirit that the pathspec filtering reduces the number of files. However, in this case, we have to be more careful. Because we want to look out for renames, we need to keep all filepairs where something was deleted. This is a bit hacky and should really be replaced by equivalent support in --follow, and just using that. However, in the meantime it speeds up 'log -M -L' by an order of magnitude. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:34 +01:00			`if (opt->detect_rename) {`
			`filter_diffs_for_paths(range, 1);`
			`if (diff_might_be_rename())`
			`diffcore_std(opt);`
			`filter_diffs_for_paths(range, 0);`
			`}`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`move_diff_queue(queue, &diff_queued_diff);`
			`}`

			`static char get_nth_line(long line, unsigned long ends, void *data)`
			`{`
			`if (line == 0)`
			`return (char *)data;`
			`else`
			`return (char *)data + ends[line] + 1;`
			`}`

			`static void print_line(const char *prefix, char first,`
			`long line, unsigned long ends, void data,`
			`const char color, const char reset)`
			`{`
			`char *begin = get_nth_line(line, ends, data);`
			`char *end = get_nth_line(line+1, ends, data);`
			`int had_nl = 0;`

			`if (end > begin && end[-1] == '\n') {`
			`end--;`
			`had_nl = 1;`
			`}`

			`fputs(prefix, stdout);`
			`fputs(color, stdout);`
			`putchar(first);`
			`fwrite(begin, 1, end-begin, stdout);`
			`fputs(reset, stdout);`
			`putchar('\n');`
			`if (!had_nl)`
			`fputs("\\ No newline at end of file\n", stdout);`
			`}`

			`static char output_prefix(struct diff_options opt)`
			`{`
			`char *prefix = "";`

			`if (opt->output_prefix) {`
			`struct strbuf *sb = opt->output_prefix(opt, opt->output_prefix_data);`
			`prefix = sb->buf;`
			`}`

			`return prefix;`
			`}`

			`static void dump_diff_hacky_one(struct rev_info rev, struct line_log_data range)`
			`{`
			`int i, j = 0;`
			`long p_lines, t_lines;`
			`unsigned long p_ends = NULL, t_ends = NULL;`
			`struct diff_filepair *pair = range->pair;`
			`struct diff_ranges *diff = &range->diff;`

			`struct diff_options *opt = &rev->diffopt;`
			`char *prefix = output_prefix(opt);`
			`const char *c_reset = diff_get_color(opt->use_color, DIFF_RESET);`
			`const char *c_frag = diff_get_color(opt->use_color, DIFF_FRAGINFO);`
			`const char *c_meta = diff_get_color(opt->use_color, DIFF_METAINFO);`
			`const char *c_old = diff_get_color(opt->use_color, DIFF_FILE_OLD);`
			`const char *c_new = diff_get_color(opt->use_color, DIFF_FILE_NEW);`
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT The latter is a much more descriptive name (and we support "color.diff.context" now). This also updates the name of any local variables which were used to store the color. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-27 22:48:46 +02:00			`const char *c_context = diff_get_color(opt->use_color, DIFF_CONTEXT);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`if (!pair \|\| !diff)`
			`return;`

			`if (pair->one->sha1_valid)`
			`fill_line_ends(pair->one, &p_lines, &p_ends);`
			`fill_line_ends(pair->two, &t_lines, &t_ends);`

			`printf("%s%sdiff --git a/%s b/%s%s\n", prefix, c_meta, pair->one->path, pair->two->path, c_reset);`
			`printf("%s%s--- %s%s%s\n", prefix, c_meta,`
			`pair->one->sha1_valid ? "a/" : "",`
			`pair->one->sha1_valid ? pair->one->path : "/dev/null",`
			`c_reset);`
			`printf("%s%s+++ b/%s%s\n", prefix, c_meta, pair->two->path, c_reset);`
			`for (i = 0; i < range->ranges.nr; i++) {`
			`long p_start, p_end;`
			`long t_start = range->ranges.ranges[i].start;`
			`long t_end = range->ranges.ranges[i].end;`
			`long t_cur = t_start;`
			`int j_last;`

			`while (j < diff->target.nr && diff->target.ranges[j].end < t_start)`
			`j++;`
			`if (j == diff->target.nr \|\| diff->target.ranges[j].start > t_end)`
			`continue;`

			`/* Scan ahead to determine the last diff that falls in this range */`
			`j_last = j;`
			`while (j_last < diff->target.nr && diff->target.ranges[j_last].start < t_end)`
			`j_last++;`
			`if (j_last > j)`
			`j_last--;`

			`/*`
			`* Compute parent hunk headers: we know that the diff`
			`* has the correct line numbers (but not all hunks).`
			`* So it suffices to shift the start/end according to`
			`* the line numbers of the first/last hunk(s) that`
			`* fall in this range.`
			`*/`
			`if (t_start < diff->target.ranges[j].start)`
			`p_start = diff->parent.ranges[j].start - (diff->target.ranges[j].start-t_start);`
			`else`
			`p_start = diff->parent.ranges[j].start;`
			`if (t_end > diff->target.ranges[j_last].end)`
			`p_end = diff->parent.ranges[j_last].end + (t_end-diff->target.ranges[j_last].end);`
			`else`
			`p_end = diff->parent.ranges[j_last].end;`

			`if (!p_start && !p_end) {`
			`p_start = -1;`
			`p_end = -1;`
			`}`

			`/* Now output a diff hunk for this range */`
			`printf("%s%s@@ -%ld,%ld +%ld,%ld @@%s\n",`
			`prefix, c_frag,`
			`p_start+1, p_end-p_start, t_start+1, t_end-t_start,`
			`c_reset);`
			`while (j < diff->target.nr && diff->target.ranges[j].start < t_end) {`
			`int k;`
			`for (; t_cur < diff->target.ranges[j].start; t_cur++)`
			`print_line(prefix, ' ', t_cur, t_ends, pair->two->data,`
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT The latter is a much more descriptive name (and we support "color.diff.context" now). This also updates the name of any local variables which were used to store the color. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-27 22:48:46 +02:00			`c_context, c_reset);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`for (k = diff->parent.ranges[j].start; k < diff->parent.ranges[j].end; k++)`
			`print_line(prefix, '-', k, p_ends, pair->one->data,`
			`c_old, c_reset);`
			`for (; t_cur < diff->target.ranges[j].end && t_cur < t_end; t_cur++)`
			`print_line(prefix, '+', t_cur, t_ends, pair->two->data,`
			`c_new, c_reset);`
			`j++;`
			`}`
			`for (; t_cur < t_end; t_cur++)`
			`print_line(prefix, ' ', t_cur, t_ends, pair->two->data,`
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT The latter is a much more descriptive name (and we support "color.diff.context" now). This also updates the name of any local variables which were used to store the color. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-27 22:48:46 +02:00			`c_context, c_reset);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`}`

			`free(p_ends);`
			`free(t_ends);`
			`}`

			`/*`
			`* NEEDSWORK: manually building a diff here is not the Right`
			`* Thing(tm). log -L should be built into the diff pipeline.`
			`*/`
			`static void dump_diff_hacky(struct rev_info rev, struct line_log_data range)`
			`{`
			`puts(output_prefix(&rev->diffopt));`
			`while (range) {`
			`dump_diff_hacky_one(rev, range);`
			`range = range->next;`
			`}`
			`}`

			`/*`
			`* Unlike most other functions, this destructively operates on`
			`* 'range'.`
			`*/`
			`static int process_diff_filepair(struct rev_info *rev,`
			`struct diff_filepair *pair,`
			`struct line_log_data *range,`
			`struct diff_ranges **diff_out)`
			`{`
			`struct line_log_data *rg = range;`
			`struct range_set tmp;`
			`struct diff_ranges diff;`
			`mmfile_t file_parent, file_target;`

			`assert(pair->two->path);`
			`while (rg) {`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`assert(rg->path);`
			`if (!strcmp(rg->path, pair->two->path))`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`break;`
			`rg = rg->next;`
			`}`

			`if (!rg)`
			`return 0;`
			`if (rg->ranges.nr == 0)`
			`return 0;`

			`assert(pair->two->sha1_valid);`
			`diff_populate_filespec(pair->two, 0);`
			`file_target.ptr = pair->two->data;`
			`file_target.size = pair->two->size;`

			`if (pair->one->sha1_valid) {`
			`diff_populate_filespec(pair->one, 0);`
			`file_parent.ptr = pair->one->data;`
			`file_parent.size = pair->one->size;`
			`} else {`
			`file_parent.ptr = "";`
			`file_parent.size = 0;`
			`}`

			`diff_ranges_init(&diff);`
			`collect_diff(&file_parent, &file_target, &diff);`

			`/* NEEDSWORK should apply some heuristics to prevent mismatches */`
log -L: store the path instead of a diff_filespec line_log_data has held a diff_filespec* since the very early versions of the code. However, the only place in the code where we actually need the full filespec is parse_range_arg(); in all other cases, we are only interested in the path, so there is hardly a reason to store a filespec. Even worse, it causes a lot of redundant ->spec->path pointer dereferencing. And even worse, it caused the following bug. If you merge a rename with a modification to the old filename, like so: * Merge \| \ \| * Modify foo \| \| * \| Rename foo->bar \| / * Create foo we internally -- in process_ranges_merge_commit() -- scan all parents. We are mainly looking for one that doesn't have any modifications, so that we can assign all the blame to it and simplify away the merge. In doing so, we run the normal machinery on all parents in a loop. For each parent, we prepare a "working set" line_log_data by making a copy with line_log_data_copy(), which does not make a copy of the spec. Now suppose the rename is the first parent. The diff machinery tells us that the filepair is ('foo', 'bar'). We duly update the path we are interested in: rg->spec->path = xstrdup(pair->one->path); But that 'struct spec' is shared between the output line_log_data and the original input line_log_data. So we just wrecked the state of process_ranges_merge_commit(). When we get around to the second parent, the ranges tell us we are interested in a file 'foo' while the commits touch 'bar'. So most of this patch is just s/->spec->path/->path/ and associated management changes. This implicitly fixes the bug because we removed the shared parts between input and output of line_log_data_copy(); it is now safe to overwrite the path in the copy. There's one only somewhat related change: the comment in process_all_files() explains the reasoning behind using 'range' there. That bit of half-correct code had me sidetracked for a while. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:11 +02:00			`free(rg->path);`
			`rg->path = xstrdup(pair->one->path);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
			`range_set_init(&tmp, 0);`
			`range_set_map_across_diff(&tmp, &rg->ranges, &diff, diff_out);`
			`range_set_release(&rg->ranges);`
			`range_set_move(&rg->ranges, &tmp);`

			`diff_ranges_release(&diff);`

			`return ((*diff_out)->parent.nr > 0);`
			`}`

			`static struct diff_filepair diff_filepair_dup(struct diff_filepair pair)`
			`{`
			`struct diff_filepair *new = xmalloc(sizeof(struct diff_filepair));`
			`new->one = pair->one;`
			`new->two = pair->two;`
			`new->one->count++;`
			`new->two->count++;`
			`return new;`
			`}`

			`static void free_diffqueues(int n, struct diff_queue_struct *dq)`
			`{`
			`int i, j;`
			`for (i = 0; i < n; i++)`
			`for (j = 0; j < dq[i].nr; j++)`
			`diff_free_filepair(dq[i].queue[j]);`
			`free(dq);`
			`}`

			`static int process_all_files(struct line_log_data **range_out,`
			`struct rev_info *rev,`
			`struct diff_queue_struct *queue,`
			`struct line_log_data *range)`
			`{`
			`int i, changed = 0;`

			`*range_out = line_log_data_copy(range);`

			`for (i = 0; i < queue->nr; i++) {`
			`struct diff_ranges *pairdiff = NULL;`
log -L: improve comments in process_all_files() The funny range assignment in process_all_files() had me sidetracked while investigating what led to the previous commit. Let's improve the comments. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:12 +02:00			`struct diff_filepair *pair = queue->queue[i];`
			`if (process_diff_filepair(rev, pair, *range_out, &pairdiff)) {`
			`/*`
			`* Store away the diff for later output. We`
			`* tuck it in the ranges we got as _input_,`
			`* since that's the commit that caused the`
			`* diff.`
			`*`
			`* NEEDSWORK not enough when we get around to`
			`* doing something interesting with merges;`
			`* currently each invocation on a merge parent`
			`* trashes the previous one's diff.`
			`*`
			`* NEEDSWORK tramples over data structures not owned here`
			`*/`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`struct line_log_data *rg = range;`
			`changed++;`
log -L: improve comments in process_all_files() The funny range assignment in process_all_files() had me sidetracked while investigating what led to the previous commit. Let's improve the comments. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-12 18:05:12 +02:00			`while (rg && strcmp(rg->path, pair->two->path))`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`rg = rg->next;`
			`assert(rg);`
			`rg->pair = diff_filepair_dup(queue->queue[i]);`
			`memcpy(&rg->diff, pairdiff, sizeof(struct diff_ranges));`
			`}`
line-log.c: fix a memleak The `filepair` is assigned new memory with any iteration via process_diff_filepair, so free it before the current iteration ends. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-03-31 03:22:07 +02:00			`free(pairdiff);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`}`

			`return changed;`
			`}`

			`int line_log_print(struct rev_info rev, struct commit commit)`
			`{`
			`struct line_log_data *range = lookup_line_range(rev, commit);`

			`show_log(rev);`
			`dump_diff_hacky(rev, range);`
			`return 1;`
			`}`

			`static int process_ranges_ordinary_commit(struct rev_info rev, struct commit commit,`
			`struct line_log_data *range)`
			`{`
			`struct commit *parent = NULL;`
			`struct diff_queue_struct queue;`
			`struct line_log_data *parent_range;`
			`int changed;`

			`if (commit->parents)`
			`parent = commit->parents->item;`

Speed up log -L... -M So far log -L only used the implicit diff filtering by pathspec. If the user specifies -M, we cannot do that, and so we simply handed the whole diff queue (which is approximately 'git show --raw') to diffcore_std(). Unfortunately this is very slow. We can optimize a lot if we throw out files that we know cannot possibly be interesting, in the same spirit that the pathspec filtering reduces the number of files. However, in this case, we have to be more careful. Because we want to look out for renames, we need to keep all filepairs where something was deleted. This is a bit hacky and should really be replaced by equivalent support in --follow, and just using that. However, in the meantime it speeds up 'log -M -L' by an order of magnitude. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:34 +01:00			`queue_diffs(range, &rev->diffopt, &queue, commit, parent);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`changed = process_all_files(&parent_range, rev, &queue, range);`
			`if (parent)`
			`add_line_range(rev, parent, parent_range);`
			`return changed;`
			`}`

			`static int process_ranges_merge_commit(struct rev_info rev, struct commit commit,`
			`struct line_log_data *range)`
			`{`
			`struct diff_queue_struct *diffqueues;`
			`struct line_log_data **cand;`
			`struct commit **parents;`
			`struct commit_list *p;`
			`int i;`
use commit_list_count() to count the members of commit_lists Call commit_list_count() instead of open-coding it repeatedly. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-07-17 01:52:09 +02:00			`int nparents = commit_list_count(commit->parents);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00
line-log: fix crash when --first-parent is used line-log tries to access all parents of a commit, but only the first parent has been loaded if "--first-parent" is specified, resulting in a crash. Limit the number of parents to one if "--first-parent" is specified. Reported-by: Eric N. Vander Weele <ericvw@gmail.com> Signed-off-by: Tzvetan Mikov <tmikov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-11-04 21:33:37 +01:00			`if (nparents > 1 && rev->first_parent_only)`
			`nparents = 1;`

Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`diffqueues = xmalloc(nparents * sizeof(*diffqueues));`
			`cand = xmalloc(nparents * sizeof(*cand));`
			`parents = xmalloc(nparents * sizeof(*parents));`

			`p = commit->parents;`
			`for (i = 0; i < nparents; i++) {`
			`parents[i] = p->item;`
			`p = p->next;`
Speed up log -L... -M So far log -L only used the implicit diff filtering by pathspec. If the user specifies -M, we cannot do that, and so we simply handed the whole diff queue (which is approximately 'git show --raw') to diffcore_std(). Unfortunately this is very slow. We can optimize a lot if we throw out files that we know cannot possibly be interesting, in the same spirit that the pathspec filtering reduces the number of files. However, in this case, we have to be more careful. Because we want to look out for renames, we need to keep all filepairs where something was deleted. This is a bit hacky and should really be replaced by equivalent support in --follow, and just using that. However, in the meantime it speeds up 'log -M -L' by an order of magnitude. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:34 +01:00			`queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`}`

			`for (i = 0; i < nparents; i++) {`
			`int changed;`
			`cand[i] = NULL;`
			`changed = process_all_files(&cand[i], rev, &diffqueues[i], range);`
			`if (!changed) {`
			`/*`
			`* This parent can take all the blame, so we`
			`* don't follow any other path in history`
			`*/`
			`add_line_range(rev, parents[i], cand[i]);`
			`clear_commit_line_range(rev, commit);`
line-log: use commit_list_append() instead of duplicating its code Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-07-08 18:23:37 +02:00			`commit_list_append(parents[i], &commit->parents);`
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-03-28 17:47:32 +01:00			`free(parents);`
			`free(cand);`
			`free_diffqueues(nparents, diffqueues);`
			`/* NEEDSWORK leaking like a sieve */`
			`return 0;`
			`}`
			`}`

			`/*`
			`* No single parent took the blame. We add the candidates`
			`* from the above loop to the parents.`
			`*/`
			`for (i = 0; i < nparents; i++) {`
			`add_line_range(rev, parents[i], cand[i]);`
			`}`

			`clear_commit_line_range(rev, commit);`
			`free(parents);`
			`free(cand);`
			`free_diffqueues(nparents, diffqueues);`
			`return 1;`

			`/* NEEDSWORK evil merge detection stuff */`
			`/* NEEDSWORK leaking like a sieve */`
			`}`

			`static int process_ranges_arbitrary_commit(struct rev_info rev, struct commit commit)`
			`{`
			`struct line_log_data *range = lookup_line_range(rev, commit);`
			`int changed = 0;`

			`if (range) {`
			`if (!commit->parents \|\| !commit->parents->next)`
			`changed = process_ranges_ordinary_commit(rev, commit, range);`
			`else`
			`changed = process_ranges_merge_commit(rev, commit, range);`
			`}`

			`if (!changed)`
			`commit->object.flags \|= TREESAME;`

			`return changed;`
			`}`

			`static enum rewrite_result line_log_rewrite_one(struct rev_info rev, struct commit *pp)`
			`{`
			`for (;;) {`
			`struct commit p = pp;`
			`if (p->parents && p->parents->next)`
			`return rewrite_one_ok;`
			`if (p->object.flags & UNINTERESTING)`
			`return rewrite_one_ok;`
			`if (!(p->object.flags & TREESAME))`
			`return rewrite_one_ok;`
			`if (!p->parents)`
			`return rewrite_one_noparents;`
			`*pp = p->parents->item;`
			`}`
			`}`

			`int line_log_filter(struct rev_info *rev)`
			`{`
			`struct commit *commit;`
			`struct commit_list *list = rev->commits;`
			`struct commit_list out = NULL, *pp = &out;`

			`while (list) {`
			`struct commit_list *to_free = NULL;`
			`commit = list->item;`
			`if (process_ranges_arbitrary_commit(rev, commit)) {`
			`*pp = list;`
			`pp = &list->next;`
			`} else`
			`to_free = list;`
			`list = list->next;`
			`free(to_free);`
			`}`
			`*pp = NULL;`

			`for (list = out; list; list = list->next)`
			`rewrite_parents(rev, list->item, line_log_rewrite_one);`

			`rev->commits = out;`

			`return 0;`
			`}`