mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-31 14:27:54 +01:00

2453 lines

61 KiB

C

Raw Normal View History

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`/*`
			`* Pickaxe`
			`*`
			`* Copyright (c) 2006, Junio C Hamano`
			`*/`

			`#include "cache.h"`
			`#include "builtin.h"`
			`#include "blob.h"`
			`#include "commit.h"`
			`#include "tag.h"`
			`#include "tree-walk.h"`
			`#include "diff.h"`
			`#include "diffcore.h"`
			`#include "revision.h"`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`#include "quote.h"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`#include "xdiff-interface.h"`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`#include "cache-tree.h"`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`#include "path-list.h"`
			`#include "mailmap.h"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`static char blame_usage[] =`
git-blame -w: ignore whitespace When refactoring code to split one iteration of a too deeply nested loop into a separate function, it inevitably makes the indentation levels shallower (that's the sole point of such a refactoring). With "git blame -w", you can ignore such re-indentation and pass blame for such moved lines to the parent. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:14:56 +02:00			`"git-blame [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-p] [-w] [-L n,m] [-S <revs-file>] [-M] [-C] [-C] [--contents <filename>] [--incremental] [commit] [--] file\n"`
Update git-annotate/git-blame documentation Moved options that pertained to both git-blame and git-annotate to a common file blame-options.txt. builtin-blame.c: Removed --compatibility, --long, --time from the short usage as they are not handled in the code. Documentation/git-blame.txt: Removed common options to git-annotate. Added documentation for --score-debug. Removed --compatibility. Adjusted usage at top to not wrap on 80 columns. Documentation/git-annotate.txt: Using common options blame-options.txt. Documentation/blame-options.txt: Added -b note about associated config option, added --root note about associated config option, added documentation for --show-stats. Removed --long, --time, --rev-file as those options do not really exist. Added documentation for -M/-C taking an optional score argument for detection of moved lines. Signed-off-by: Andrew Ruder <andy@aeruder.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-16 08:20:34 +02:00			`" -c Use the same output mode as git-annotate (Default: off)\n"`
blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`" -b Show blank SHA-1 for boundary commits (Default: off)\n"`
Update git-annotate/git-blame documentation Moved options that pertained to both git-blame and git-annotate to a common file blame-options.txt. builtin-blame.c: Removed --compatibility, --long, --time from the short usage as they are not handled in the code. Documentation/git-blame.txt: Removed common options to git-annotate. Added documentation for --score-debug. Removed --compatibility. Adjusted usage at top to not wrap on 80 columns. Documentation/git-annotate.txt: Using common options blame-options.txt. Documentation/blame-options.txt: Added -b note about associated config option, added --root note about associated config option, added documentation for --show-stats. Removed --long, --time, --rev-file as those options do not really exist. Added documentation for -M/-C taking an optional score argument for detection of moved lines. Signed-off-by: Andrew Ruder <andy@aeruder.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-16 08:20:34 +02:00			`" -l Show long commit SHA1 (Default: off)\n"`
blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`" --root Do not treat root commits as boundaries (Default: off)\n"`
Update git-annotate/git-blame documentation Moved options that pertained to both git-blame and git-annotate to a common file blame-options.txt. builtin-blame.c: Removed --compatibility, --long, --time from the short usage as they are not handled in the code. Documentation/git-blame.txt: Removed common options to git-annotate. Added documentation for --score-debug. Removed --compatibility. Adjusted usage at top to not wrap on 80 columns. Documentation/git-annotate.txt: Using common options blame-options.txt. Documentation/blame-options.txt: Added -b note about associated config option, added --root note about associated config option, added documentation for --show-stats. Removed --long, --time, --rev-file as those options do not really exist. Added documentation for -M/-C taking an optional score argument for detection of moved lines. Signed-off-by: Andrew Ruder <andy@aeruder.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-16 08:20:34 +02:00			`" -t Show raw timestamp (Default: off)\n"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`" -f, --show-name Show original filename (Default: auto)\n"`
			`" -n, --show-number Show original linenumber (Default: off)\n"`
blame -s: suppress author name and time. With this "git blame -b -s HEAD~n..HEAD" becomes a nicer way to review the result of recent changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 00:50:45 +02:00			`" -s Suppress author name and timestamp (Default: off)\n"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`" -p, --porcelain Show in a format designed for machine consumption\n"`
git-blame -w: ignore whitespace When refactoring code to split one iteration of a too deeply nested loop into a separate function, it inevitably makes the indentation levels shallower (that's the sole point of such a refactoring). With "git blame -w", you can ignore such re-indentation and pass blame for such moved lines to the parent. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:14:56 +02:00			`" -w Ignore whitespace differences\n"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`" -L n,m Process only line range n,m, counting from 1\n"`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`" -M, -C Find line movements within and across files\n"`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`" --incremental Show blame entries as we find them, incrementally\n"`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`" --contents file Use <file>'s contents as the final image\n"`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`" -S revs-file Use revisions from revs-file instead of calling git-rev-list\n";`

			`static int longest_file;`
			`static int longest_author;`
			`static int max_orig_digits;`
			`static int max_digits;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`static int max_score_digits;`
blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`static int show_root;`
			`static int blank_boundary;`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`static int incremental;`
annotate: fix for cvsserver. git-cvsserver does not want the boundary commits shown any differently. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-06 10:52:04 +01:00			`static int cmd_is_annotate;`
git-blame -w: ignore whitespace When refactoring code to split one iteration of a too deeply nested loop into a separate function, it inevitably makes the indentation levels shallower (that's the sole point of such a refactoring). With "git blame -w", you can ignore such re-indentation and pass blame for such moved lines to the parent. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:14:56 +02:00			`static int xdl_opts = XDF_NEED_MINIMAL;`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`static struct path_list mailmap;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`#ifndef DEBUG`
			`#define DEBUG 0`
			`#endif`

git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`/* stats */`
			`static int num_read_blob;`
			`static int num_get_patch;`
			`static int num_commits;`

git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`#define PICKAXE_BLAME_MOVE 01`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`#define PICKAXE_BLAME_COPY 02`
			`#define PICKAXE_BLAME_COPY_HARDER 04`
blame: -C -C -C When you do this, existing "blame -C -C" would not find that the latter half of the file2 came from the existing file1: ... both file1 and file2 are tracked ... $ cat file1 >>file2 $ git add file1 file2 $ git commit This is because we avoid the expensive find-copies-harder code that makes unchanged file (in this case, file1) as a candidate for copy & paste source when annotating an existing file (file2). The third -C now allows it. However, this obviously makes the process very expensive. We've actually seen this patch before, but I dismissed it because it covers such a narrow (and arguably stupid) corner case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-06 06:18:57 +02:00			`#define PICKAXE_BLAME_COPY_HARDEST 010`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00
git-pickaxe: introduce heuristics to avoid "trivial" chunks This adds scoring logic to blame_entry to prevent blames on very trivial chunks (e.g. lots of empty lines, indent followed by a closing brace) from being passed down to unrelated lines in the parent. The current heuristics are quite simple and may need to be tweaked later, but we need to start somewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 00:37:12 +02:00			`/*`
			`* blame for a blame_entry with score lower than these thresholds`
			`* is not passed to the parent using move/copy logic.`
			`*/`
			`static unsigned blame_move_score;`
			`static unsigned blame_copy_score;`
			`#define BLAME_DEFAULT_MOVE_SCORE 20`
			`#define BLAME_DEFAULT_COPY_SCORE 40`

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`/* bits #0..7 in revision.h, #8..11 used for merge_bases() in commit.c */`
			`#define METAINFO_SHOWN (1u<<12)`
			`#define MORE_THAN_ONE_PATH (1u<<13)`

			`/*`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`* One blob in a commit that is being suspected`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`*/`
			`struct origin {`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`int refcnt;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct commit *commit;`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`mmfile_t file;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`unsigned char blob_sha1[20];`
			`char path[FLEX_ARRAY];`
			`};`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Given an origin, prepare mmfile_t structure to be used by the`
			`* diff machinery`
			`*/`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`static char fill_origin_blob(struct origin o, mmfile_t *file)`
			`{`
			`if (!o->file.ptr) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`num_read_blob++;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`file->ptr = read_sha1_file(o->blob_sha1, &type,`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`(unsigned long *)(&(file->size)));`
blame: check return value from read_sha1_file() Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-25 10:26:20 +02:00			`if (!file->ptr)`
			`die("Cannot read blob %s for path %s",`
			`sha1_to_hex(o->blob_sha1),`
			`o->path);`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`o->file = *file;`
			`}`
			`else`
			`*file = o->file;`
			`return file->ptr;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Origin is refcounted and usually we keep the blob contents to be`
			`* reused.`
			`*/`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`static inline struct origin origin_incref(struct origin o)`
			`{`
			`if (o)`
			`o->refcnt++;`
			`return o;`
			`}`

			`static void origin_decref(struct origin *o)`
			`{`
			`if (o && --o->refcnt <= 0) {`
Avoid unnecessary "if-before-free" tests. This change removes all obvious useless if-before-free tests. E.g., it replaces code like this: if (some_expression) free (some_expression); with the now-equivalent: free (some_expression); It is equivalent not just because POSIX has required free(NULL) to work for a long time, but simply because it has worked for so long that no reasonable porting target fails the test. Here's some evidence from nearly 1.5 years ago: http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html FYI, the change below was prepared by running the following: git ls-files -z \| xargs -0 \ perl -0x3b -pi -e \ 's/\bif\s\(\s(\S+?)(?:\s!=\sNULL)?\s\)\s+(free\s\(\s\1\s\))/$2/s' Note however, that it doesn't handle brace-enclosed blocks like "if (x) { free (x); }". But that's ok, since there were none like that in git sources. Beware: if you do use the above snippet, note that it can produce syntactically invalid C code. That happens when the affected "if"-statement has a matching "else". E.g., it would transform this if (x) free (x); else foo (); into this: free (x); else foo (); There were none of those here, either. If you're interested in automating detection of the useless tests, you might like the useless-if-before-free script in gnulib: [it does detect brace-enclosed free statements, and has a --name=S option to make it detect free-like functions with different names] http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free Addendum: Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-31 18:26:32 +01:00			`free(o->file.ptr);`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`free(o);`
			`}`
			`}`

blame: drop blob data after passing blame to the parent We used to keep the blob data for each origin that has any remaining line in the result, but this will get very costly with a huge file that has a deep history. This patch releases the blob after we ran diff between the child rev and its parents. When passing blame from a parent to its parent (i.e. the grandparent), the blob data for the parent may need to be read again, but it should be relatively cheap, thanks to delta-base cache. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-12 01:05:50 +01:00			`static void drop_origin_blob(struct origin *o)`
			`{`
			`if (o->file.ptr) {`
			`free(o->file.ptr);`
			`o->file.ptr = NULL;`
			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Each group of lines is described by a blame_entry; it can be split`
			`* as we pass blame to the parents. They form a linked list in the`
			`* scoreboard structure, sorted by the target line number.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct blame_entry {`
			`struct blame_entry *prev;`
			`struct blame_entry *next;`

			`/* the first line of this group in the final image;`
			`* internally all line numbers are 0 based.`
			`*/`
			`int lno;`

			`/* how many lines this group has */`
			`int num_lines;`

			`/* the commit that introduced this group into the final image */`
			`struct origin *suspect;`

			`/* true if the suspect is truly guilty; false while we have not`
			`* checked if the group came from one of its parents.`
			`*/`
			`char guilty;`

			`/* the line number of the first line of this group in the`
			`* suspect's file; internally all line numbers are 0 based.`
			`*/`
			`int s_lno;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00
			`/* how significant this entry is -- cached to avoid`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* scanning the lines over and over.`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`*/`
			`unsigned score;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`};`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The current state of the blame assignment.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct scoreboard {`
			`/* the final commit (i.e. where we started digging from) */`
			`struct commit *final;`

			`const char *path;`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The contents in the final image.`
			`* Used by many functions to obtain contents of the nth line,`
			`* indexed with scoreboard.lineno[blame_entry.lno].`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`*/`
			`const char *final_buf;`
			`unsigned long final_buf_size;`

			`/* linked list of blames */`
			`struct blame_entry *ent;`

git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`/* look-up a line in the final buffer */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`int num_lines;`
			`int *lineno;`
			`};`

blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`static inline int same_suspect(struct origin a, struct origin b)`
git-pickaxe: do not confuse two origins that are the same. It used to be that we can compare the address of the origin structure to determine if they are the same because they are always registered with scoreboard. After introduction of the loop to try finding the best split, that is not true anymore. The current code has rather serious leaks with origin structure, but more importantly it gets confused when two origins that points at the same commit and same path. We might eventually have to refcount and gc origin, but let's fix the correctness issue first. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 09:41:38 +02:00			`{`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (a == b)`
blame: micro-optimize cmp_suspect() The commit structures are guaranteed their uniqueness by the object layer, so we can check their address and see if they are the same without going down to the object sha1 level. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 06:17:10 +01:00			`return 1;`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (a->commit != b->commit)`
			`return 0;`
			`return !strcmp(a->path, b->path);`
git-pickaxe: do not confuse two origins that are the same. It used to be that we can compare the address of the origin structure to determine if they are the same because they are always registered with scoreboard. After introduction of the loop to try finding the best split, that is not true anymore. The current code has rather serious leaks with origin structure, but more importantly it gets confused when two origins that points at the same commit and same path. We might eventually have to refcount and gc origin, but let's fix the correctness issue first. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 09:41:38 +02:00			`}`

git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`static void sanity_check_refcnt(struct scoreboard *);`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* If two blame entries that are next to each other came from`
			`* contiguous lines in the same origin (i.e. <commit, path> pair),`
			`* merge them together.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void coalesce(struct scoreboard *sb)`
			`{`
			`struct blame_entry ent, next;`

			`for (ent = sb->ent; ent && (next = ent->next); ent = next) {`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (same_suspect(ent->suspect, next->suspect) &&`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`ent->guilty == next->guilty &&`
			`ent->s_lno + ent->num_lines == next->s_lno) {`
			`ent->num_lines += next->num_lines;`
			`ent->next = next->next;`
			`if (ent->next)`
			`ent->next->prev = ent;`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`origin_decref(next->suspect);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`free(next);`
git-pickaxe: do not confuse two origins that are the same. It used to be that we can compare the address of the origin structure to determine if they are the same because they are always registered with scoreboard. After introduction of the loop to try finding the best split, that is not true anymore. The current code has rather serious leaks with origin structure, but more importantly it gets confused when two origins that points at the same commit and same path. We might eventually have to refcount and gc origin, but let's fix the correctness issue first. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 09:41:38 +02:00			`ent->score = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`next = ent; /* again */`
			`}`
			`}`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00
			`if (DEBUG) /* sanity */`
			`sanity_check_refcnt(sb);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Given a commit and a path in it, create a new origin structure.`
			`* The callers that add blame to the scoreboard should use`
			`* get_origin() to obtain shared, refcounted copy instead of calling`
			`* this function directly.`
			`*/`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`static struct origin make_origin(struct commit commit, const char *path)`
			`{`
			`struct origin *o;`
			`o = xcalloc(1, sizeof(*o) + strlen(path) + 1);`
			`o->commit = commit;`
			`o->refcnt = 1;`
			`strcpy(o->path, path);`
			`return o;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Locate an existing origin or create a new one.`
			`*/`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`static struct origin get_origin(struct scoreboard sb,`
			`struct commit *commit,`
			`const char *path)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`{`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`struct blame_entry *e;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`for (e = sb->ent; e; e = e->next) {`
			`if (e->suspect->commit == commit &&`
			`!strcmp(e->suspect->path, path))`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`return origin_incref(e->suspect);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`return make_origin(commit, path);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Fill the blob_sha1 field of an origin if it hasn't, so that later`
			`* call to fill_origin_blob() can use it to locate the data. blob_sha1`
			`* for an origin is also used to pass the blame for the entire file to`
			`* the parent to detect the case where a child's blob is identical to`
			`* that of its parent's.`
			`*/`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`static int fill_blob_sha1(struct origin *origin)`
			`{`
			`unsigned mode;`

			`if (!is_null_sha1(origin->blob_sha1))`
			`return 0;`
			`if (get_tree_entry(origin->commit->object.sha1,`
			`origin->path,`
			`origin->blob_sha1, &mode))`
			`goto error_out;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`if (sha1_object_info(origin->blob_sha1, NULL) != OBJ_BLOB)`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`goto error_out;`
			`return 0;`
			`error_out:`
			`hashclr(origin->blob_sha1);`
			`return -1;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We have an origin -- check if the same path exists in the`
			`* parent and return an origin structure to represent it.`
			`*/`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`static struct origin find_origin(struct scoreboard sb,`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct commit *parent,`
			`struct origin *origin)`
			`{`
			`struct origin *porigin = NULL;`
			`struct diff_options diff_opts;`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`const char *paths[2];`

git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`if (parent->util) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Each commit object can cache one origin in that`
			`* commit. This is a freestanding copy of origin and`
			`* not refcounted.`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`*/`
git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`struct origin *cached = parent->util;`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`if (!strcmp(cached->path, origin->path)) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The same path between origin and its parent`
			`* without renaming -- the most common case.`
			`*/`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`porigin = get_origin(sb, parent, cached->path);`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00
			`/*`
			`* If the origin was newly created (i.e. get_origin`
			`* would call make_origin if none is found in the`
			`* scoreboard), it does not know the blob_sha1,`
			`* so copy it. Otherwise porigin was in the`
			`* scoreboard and already knows blob_sha1.`
			`*/`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`if (porigin->refcnt == 1)`
			`hashcpy(porigin->blob_sha1, cached->blob_sha1);`
			`return porigin;`
			`}`
			`/* otherwise it was not very useful; free it */`
			`free(parent->util);`
			`parent->util = NULL;`
git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`}`

git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`/* See if the origin->path is different between parent`
			`* and origin first. Most of the time they are the`
			`* same and diff-tree is fairly efficient about this.`
			`*/`
			`diff_setup(&diff_opts);`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`DIFF_OPT_SET(&diff_opts, RECURSIVE);`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`diff_opts.detect_rename = 0;`
			`diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;`
			`paths[0] = origin->path;`
			`paths[1] = NULL;`

			`diff_tree_setup_paths(paths, &diff_opts);`
			`if (diff_setup_done(&diff_opts) < 0)`
			`die("diff-setup");`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00
			`if (is_null_sha1(origin->commit->object.sha1))`
			`do_diff_cache(parent->tree->object.sha1, &diff_opts);`
			`else`
			`diff_tree_sha1(parent->tree->object.sha1,`
			`origin->commit->tree->object.sha1,`
			`"", &diff_opts);`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`diffcore_std(&diff_opts);`

			`/* It is either one entry that says "modified", or "created",`
			`* or nothing.`
			`*/`
			`if (!diff_queued_diff.nr) {`
			`/* The path is the same as parent */`
			`porigin = get_origin(sb, parent, origin->path);`
			`hashcpy(porigin->blob_sha1, origin->blob_sha1);`
			`}`
			`else if (diff_queued_diff.nr != 1)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`die("internal error in blame::find_origin");`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`else {`
			`struct diff_filepair *p = diff_queued_diff.queue[0];`
			`switch (p->status) {`
			`default:`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`die("internal error in blame::find_origin (%c)",`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`p->status);`
			`case 'M':`
			`porigin = get_origin(sb, parent, origin->path);`
			`hashcpy(porigin->blob_sha1, p->one->sha1);`
			`break;`
			`case 'A':`
			`case 'T':`
			`/* Did not exist in parent, or type changed */`
			`break;`
			`}`
			`}`
			`diff_flush(&diff_opts);`
Fix small memory leaks induced by diff_tree_setup_paths Run diff_tree_release_paths in the appropriate places, and add a test to avoid NULL dereference. Better safe than sorry. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-11 22:59:55 +01:00			`diff_tree_release_paths(&diff_opts);`
git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`if (porigin) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Create a freestanding copy that is not part of`
			`* the refcounted origin found in the scoreboard, and`
			`* cache it in the commit.`
			`*/`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`struct origin *cached;`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`cached = make_origin(porigin->commit, porigin->path);`
			`hashcpy(cached->blob_sha1, porigin->blob_sha1);`
			`parent->util = cached;`
git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`}`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`return porigin;`
			`}`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We have an origin -- find the path that corresponds to it in its`
			`* parent and return an origin structure to represent it.`
			`*/`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`static struct origin find_rename(struct scoreboard sb,`
			`struct commit *parent,`
			`struct origin *origin)`
			`{`
			`struct origin *porigin = NULL;`
			`struct diff_options diff_opts;`
			`int i;`
			`const char *paths[2];`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`diff_setup(&diff_opts);`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`DIFF_OPT_SET(&diff_opts, RECURSIVE);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`diff_opts.detect_rename = DIFF_DETECT_RENAME;`
			`diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;`
git-pickaxe: rename detection optimization The idea is that we are interested in renaming into only one path, so we do not care about renames that happen elsewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 09:02:11 +01:00			`diff_opts.single_follow = origin->path;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`paths[0] = NULL;`
			`diff_tree_setup_paths(paths, &diff_opts);`
			`if (diff_setup_done(&diff_opts) < 0)`
			`die("diff-setup");`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00
			`if (is_null_sha1(origin->commit->object.sha1))`
			`do_diff_cache(parent->tree->object.sha1, &diff_opts);`
			`else`
			`diff_tree_sha1(parent->tree->object.sha1,`
			`origin->commit->tree->object.sha1,`
			`"", &diff_opts);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`diffcore_std(&diff_opts);`

			`for (i = 0; i < diff_queued_diff.nr; i++) {`
			`struct diff_filepair *p = diff_queued_diff.queue[i];`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`if ((p->status == 'R' \|\| p->status == 'C') &&`
git-pickaxe: get rid of wasteful find_origin(). After finding out which path in the parent to scan to pass blames, using get_tree_entry() to extract the blob information again was quite wasteful, since diff-tree already gave us that information. Separate the function to create an origin out as get_origin(). You'll never know what is more efficient unless you try and/or think hard. I somehow thought that extracting one known path out of commit's tree is cheaper than running a diff-tree for the current path between the commit and its parent, but it is not the case. In real, non-toy projects, most commits do not touch the path you are interested in, and if the path is a few levels away from the toplevel, whole-subdirectory comparison logic diff-tree allows us to skip opening lower subdirectories. This commit rewrites find_origin() function to use a single-path diff-tree to see if the parent has the same blob as the current suspect, which is cheaper than extracting the blob information using get_tree_entry() and comparing it with what the current suspect has. This shaves about 6% overhead when annotating kernel/sched.c in the Linux kernel repository on my machine. The saving rises to 25% for arch/i386/kernel/Makefile. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 11:56:33 +02:00			`!strcmp(p->two->path, origin->path)) {`
			`porigin = get_origin(sb, parent, p->one->path);`
			`hashcpy(porigin->blob_sha1, p->one->sha1);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`break;`
			`}`
			`}`
			`diff_flush(&diff_opts);`
Fix small memory leaks induced by diff_tree_setup_paths Run diff_tree_release_paths in the appropriate places, and add a test to avoid NULL dereference. Better safe than sorry. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-11 22:59:55 +01:00			`diff_tree_release_paths(&diff_opts);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`return porigin;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Parsing of patch chunks...`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct chunk {`
			`/* line number in postimage; up to but not including this`
			`* line is the same as preimage`
			`*/`
			`int same;`

			`/* preimage line number after this chunk */`
			`int p_next;`

			`/* postimage line number after this chunk */`
			`int t_next;`
			`};`

			`struct patch {`
			`struct chunk *chunks;`
			`int num;`
			`};`

			`struct blame_diff_state {`
			`struct xdiff_emit_state xm;`
			`struct patch *ret;`
			`unsigned hunk_post_context;`
			`unsigned hunk_in_pre_context : 1;`
			`};`

			`static void process_u_diff(void state_, char line, unsigned long len)`
			`{`
			`struct blame_diff_state *state = state_;`
			`struct chunk *chunk;`
			`int off1, off2, len1, len2, num;`

			`num = state->ret->num;`
			`if (len < 4 \|\| line[0] != '@' \|\| line[1] != '@') {`
			`if (state->hunk_in_pre_context && line[0] == ' ')`
			`state->ret->chunks[num - 1].same++;`
			`else {`
			`state->hunk_in_pre_context = 0;`
			`if (line[0] == ' ')`
			`state->hunk_post_context++;`
			`else`
			`state->hunk_post_context = 0;`
			`}`
			`return;`
			`}`

			`if (num && state->hunk_post_context) {`
			`chunk = &state->ret->chunks[num - 1];`
			`chunk->p_next -= state->hunk_post_context;`
			`chunk->t_next -= state->hunk_post_context;`
			`}`
			`state->ret->num = ++num;`
			`state->ret->chunks = xrealloc(state->ret->chunks,`
			`sizeof(struct chunk) * num);`
			`chunk = &state->ret->chunks[num - 1];`
			`if (parse_hunk_header(line, len, &off1, &len1, &off2, &len2)) {`
			`state->ret->num--;`
			`return;`
			`}`

			`/* Line numbers in patch output are one based. */`
			`off1--;`
			`off2--;`

			`chunk->same = len2 ? off2 : (off2 + 1);`

			`chunk->p_next = off1 + (len1 ? len1 : 1);`
			`chunk->t_next = chunk->same + len2;`
			`state->hunk_in_pre_context = 1;`
			`state->hunk_post_context = 0;`
			`}`

			`static struct patch compare_buffer(mmfile_t file_p, mmfile_t *file_o,`
			`int context)`
			`{`
			`struct blame_diff_state state;`
			`xpparam_t xpp;`
			`xdemitconf_t xecfg;`
			`xdemitcb_t ecb;`

git-blame -w: ignore whitespace When refactoring code to split one iteration of a too deeply nested loop into a separate function, it inevitably makes the indentation levels shallower (that's the sole point of such a refactoring). With "git blame -w", you can ignore such re-indentation and pass blame for such moved lines to the parent. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:14:56 +02:00			`xpp.flags = xdl_opts;`
Future-proof source for changes in xdemitconf_t The instances of xdemitconf_t were initialized member by member. Instead, initialize them to all zero, so we do not have to update those places each time we introduce a new member. [jc: minimally fixed by getting rid of a new global] Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-07-04 20:05:46 +02:00			`memset(&xecfg, 0, sizeof(xecfg));`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`xecfg.ctxlen = context;`
			`ecb.outf = xdiff_outf;`
			`ecb.priv = &state;`
			`memset(&state, 0, sizeof(state));`
			`state.xm.consume = process_u_diff;`
			`state.ret = xmalloc(sizeof(struct patch));`
			`state.ret->chunks = NULL;`
			`state.ret->num = 0;`

xdl_diff: identify call sites. This inserts a new function xdi_diff() that currently does not do anything other than calling the underlying xdl_diff() to the callchain of current callers of xdl_diff() function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-13 22:25:07 +01:00			`xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`if (state.ret->num) {`
			`struct chunk *chunk;`
			`chunk = &state.ret->chunks[state.ret->num - 1];`
			`chunk->p_next -= state.hunk_post_context;`
			`chunk->t_next -= state.hunk_post_context;`
			`}`
			`return state.ret;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Run diff between two origins and grab the patch output, so that`
			`* we can pass blame for lines origin is currently suspected for`
			`* to its parent.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static struct patch get_patch(struct origin parent, struct origin *origin)`
			`{`
			`mmfile_t file_p, file_o;`
			`struct patch *patch;`

git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`fill_origin_blob(parent, &file_p);`
			`fill_origin_blob(origin, &file_o);`
			`if (!file_p.ptr \|\| !file_o.ptr)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`return NULL;`
			`patch = compare_buffer(&file_p, &file_o, 0);`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`num_get_patch++;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`return patch;`
			`}`

			`static void free_patch(struct patch *p)`
			`{`
			`free(p->chunks);`
			`free(p);`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
Assorted typo fixes Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-04 05:49:16 +01:00			`* Link in a new blame entry to the scoreboard. Entries that cover the`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* same line range have been removed from the scoreboard previously.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void add_blame_entry(struct scoreboard sb, struct blame_entry e)`
			`{`
			`struct blame_entry ent, prev = NULL;`

git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`origin_incref(e->suspect);`

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`for (ent = sb->ent; ent && ent->lno < e->lno; ent = ent->next)`
			`prev = ent;`

			`/* prev, if not NULL, is the last one that is below e */`
			`e->prev = prev;`
			`if (prev) {`
			`e->next = prev->next;`
			`prev->next = e;`
			`}`
			`else {`
			`e->next = sb->ent;`
			`sb->ent = e;`
			`}`
			`if (e->next)`
			`e->next->prev = e;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* src typically is on-stack; we want to copy the information in it to`
Fix grammar nits in documentation and in code comments. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-03 15:18:07 +01:00			`* a malloced blame_entry that is already on the linked list of the`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* scoreboard. The origin of dst loses a refcnt while the origin of src`
			`* gains one.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void dup_entry(struct blame_entry dst, struct blame_entry src)`
			`{`
			`struct blame_entry p, n;`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`p = dst->prev;`
			`n = dst->next;`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`origin_incref(src->suspect);`
			`origin_decref(dst->suspect);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`memcpy(dst, src, sizeof(*src));`
			`dst->prev = p;`
			`dst->next = n;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`dst->score = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

			`static const char nth_line(struct scoreboard sb, int lno)`
			`{`
			`return sb->final_buf + sb->lineno[lno];`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* It is known that lines between tlno to same came from parent, and e`
			`* has an overlap with that range. it also is known that parent's`
			`* line plno corresponds to e's line tlno.`
			`*`
			`* <---- e ----->`
			`* <------>`
			`* <------------>`
			`* <------------>`
			`* <------------------>`
			`*`
			`* Split e into potentially three parts; before this chunk, the chunk`
			`* to be blamed for the parent, and after that portion.`
			`*/`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`static void split_overlap(struct blame_entry *split,`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct blame_entry *e,`
			`int tlno, int plno, int same,`
			`struct origin *parent)`
			`{`
			`int chunk_end_lno;`
			`memset(split, 0, sizeof(struct blame_entry [3]));`

			`if (e->s_lno < tlno) {`
			`/* there is a pre-chunk part not blamed on parent */`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`split[0].suspect = origin_incref(e->suspect);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`split[0].lno = e->lno;`
			`split[0].s_lno = e->s_lno;`
			`split[0].num_lines = tlno - e->s_lno;`
			`split[1].lno = e->lno + tlno - e->s_lno;`
			`split[1].s_lno = plno;`
			`}`
			`else {`
			`split[1].lno = e->lno;`
			`split[1].s_lno = plno + (e->s_lno - tlno);`
			`}`

			`if (same < e->s_lno + e->num_lines) {`
			`/* there is a post-chunk part not blamed on parent */`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`split[2].suspect = origin_incref(e->suspect);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`split[2].lno = e->lno + (same - e->s_lno);`
			`split[2].s_lno = e->s_lno + (same - e->s_lno);`
			`split[2].num_lines = e->s_lno + e->num_lines - same;`
			`chunk_end_lno = split[2].lno;`
			`}`
			`else`
			`chunk_end_lno = e->lno + e->num_lines;`
			`split[1].num_lines = chunk_end_lno - split[1].lno;`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* if it turns out there is nothing to blame the parent for,`
			`* forget about the splitting. !split[1].suspect signals this.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (split[1].num_lines < 1)`
			`return;`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`split[1].suspect = origin_incref(parent);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* split_overlap() divided an existing blame e into up to three parts`
			`* in split. Adjust the linked list of blames in the scoreboard to`
			`* reflect the split.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void split_blame(struct scoreboard *sb,`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`struct blame_entry *split,`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct blame_entry *e)`
			`{`
			`struct blame_entry *new_entry;`

			`if (split[0].suspect && split[2].suspect) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* The first part (reuse storage for the existing entry e) */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`dup_entry(e, &split[0]);`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* The last part -- me */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`new_entry = xmalloc(sizeof(*new_entry));`
			`memcpy(new_entry, &(split[2]), sizeof(struct blame_entry));`
			`add_blame_entry(sb, new_entry);`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* ... and the middle part -- parent */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`new_entry = xmalloc(sizeof(*new_entry));`
			`memcpy(new_entry, &(split[1]), sizeof(struct blame_entry));`
			`add_blame_entry(sb, new_entry);`
			`}`
			`else if (!split[0].suspect && !split[2].suspect)`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The parent covers the entire area; reuse storage for`
			`* e and replace it with the parent.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`dup_entry(e, &split[1]);`
			`else if (split[0].suspect) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* me and then parent */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`dup_entry(e, &split[0]);`

			`new_entry = xmalloc(sizeof(*new_entry));`
			`memcpy(new_entry, &(split[1]), sizeof(struct blame_entry));`
			`add_blame_entry(sb, new_entry);`
			`}`
			`else {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* parent and then me */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`dup_entry(e, &split[1]);`

			`new_entry = xmalloc(sizeof(*new_entry));`
			`memcpy(new_entry, &(split[2]), sizeof(struct blame_entry));`
			`add_blame_entry(sb, new_entry);`
			`}`

git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`if (DEBUG) { /* sanity */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct blame_entry *ent;`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`int lno = sb->ent->lno, corrupt = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`for (ent = sb->ent; ent; ent = ent->next) {`
			`if (lno != ent->lno)`
			`corrupt = 1;`
			`if (ent->s_lno < 0)`
			`corrupt = 1;`
			`lno += ent->num_lines;`
			`}`
			`if (corrupt) {`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`lno = sb->ent->lno;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`for (ent = sb->ent; ent; ent = ent->next) {`
			`printf("L %8d l %8d n %8d\n",`
			`lno, ent->lno, ent->num_lines);`
			`lno = ent->lno + ent->num_lines;`
			`}`
			`die("oops");`
			`}`
			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* After splitting the blame, the origins used by the`
			`* on-stack blame_entry should lose one refcnt each.`
			`*/`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`static void decref_split(struct blame_entry *split)`
			`{`
			`int i;`

			`for (i = 0; i < 3; i++)`
			`origin_decref(split[i].suspect);`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Helper for blame_chunk(). blame_entry e is known to overlap with`
			`* the patch hunk; split it and pass blame to the parent.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void blame_overlap(struct scoreboard sb, struct blame_entry e,`
			`int tlno, int plno, int same,`
			`struct origin *parent)`
			`{`
			`struct blame_entry split[3];`

			`split_overlap(split, e, tlno, plno, same, parent);`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`if (split[1].suspect)`
			`split_blame(sb, split, e);`
			`decref_split(split);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Find the line number of the last line the target is suspected for.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int find_last_in_target(struct scoreboard sb, struct origin target)`
			`{`
			`struct blame_entry *e;`
			`int last_in_target = -1;`

			`for (e = sb->ent; e; e = e->next) {`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (e->guilty \|\| !same_suspect(e->suspect, target))`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`continue;`
			`if (last_in_target < e->s_lno + e->num_lines)`
			`last_in_target = e->s_lno + e->num_lines;`
			`}`
			`return last_in_target;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Process one hunk from the patch between the current suspect for`
			`* blame_entry e and its parent. Find and split the overlap, and`
			`* pass blame to the overlapping part to the parent.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void blame_chunk(struct scoreboard *sb,`
			`int tlno, int plno, int same,`
			`struct origin target, struct origin parent)`
			`{`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`struct blame_entry *e;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`for (e = sb->ent; e; e = e->next) {`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (e->guilty \|\| !same_suspect(e->suspect, target))`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`continue;`
			`if (same <= e->s_lno)`
			`continue;`
			`if (tlno < e->s_lno + e->num_lines)`
			`blame_overlap(sb, e, tlno, plno, same, parent);`
			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We are looking at the origin 'target' and aiming to pass blame`
			`* for the lines it is suspected to its parent. Run diff to find`
			`* which lines came from parent and pass blame for them.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int pass_blame_to_parent(struct scoreboard *sb,`
			`struct origin *target,`
			`struct origin *parent)`
			`{`
			`int i, last_in_target, plno, tlno;`
			`struct patch *patch;`

			`last_in_target = find_last_in_target(sb, target);`
			`if (last_in_target < 0)`
			`return 1; /* nothing remains for this target */`

			`patch = get_patch(parent, target);`
			`plno = tlno = 0;`
			`for (i = 0; i < patch->num; i++) {`
			`struct chunk *chunk = &patch->chunks[i];`

			`blame_chunk(sb, tlno, plno, chunk->same, target, parent);`
			`plno = chunk->p_next;`
			`tlno = chunk->t_next;`
			`}`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/* The rest (i.e. anything after tlno) are the same as the parent */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`blame_chunk(sb, tlno, plno, last_in_target, target, parent);`

			`free_patch(patch);`
			`return 0;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The lines in blame_entry after splitting blames many times can become`
			`* very small and trivial, and at some point it becomes pointless to`
			`* blame the parents. E.g. "\t\t}\n\t}\n\n" appears everywhere in any`
			`* ordinary C program, and it is not worth to say it was copied from`
			`* totally unrelated file in the parent.`
			`*`
			`* Compute how trivial the lines in the blame_entry are.`
			`*/`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`static unsigned ent_score(struct scoreboard sb, struct blame_entry e)`
			`{`
			`unsigned score;`
			`const char cp, ep;`

			`if (e->score)`
			`return e->score;`

git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`score = 1;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`cp = nth_line(sb, e->lno);`
			`ep = nth_line(sb, e->lno + e->num_lines);`
			`while (cp < ep) {`
			`unsigned ch = ((unsigned char )cp);`
			`if (isalnum(ch))`
			`score++;`
			`cp++;`
			`}`
			`e->score = score;`
			`return score;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* best_so_far[] and this[] are both a split of an existing blame_entry`
			`* that passes blame to the parent. Maintain best_so_far the best split`
			`* so far, by comparing this and best_so_far and copying this into`
			`* bst_so_far as needed.`
			`*/`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`static void copy_split_if_better(struct scoreboard *sb,`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`struct blame_entry *best_so_far,`
			`struct blame_entry *this)`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`{`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`int i;`

git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`if (!this[1].suspect)`
			`return;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`if (best_so_far[1].suspect) {`
			`if (ent_score(sb, &this[1]) < ent_score(sb, &best_so_far[1]))`
			`return;`
			`}`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00
			`for (i = 0; i < 3; i++)`
			`origin_incref(this[i].suspect);`
			`decref_split(best_so_far);`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`memcpy(best_so_far, this, sizeof(struct blame_entry [3]));`
			`}`

blame: Notice a wholesale incorporation of an existing file. The -C option to blame tries to find a section of a preimage file by running diff against the lines whose origin is still unknown, and excluding the different parts. The code however did not cover the case where the tail part of the section matched, which we handle for the normal non-move/copy codepath. This breakage was most visible when preimage file matches in its entirety and failed to pass blame in such a case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-05 18:13:26 +02:00			`/*`
			`* We are looking at a part of the final image represented by`
			`* ent (tlno and same are offset by ent->s_lno).`
			`* tlno is where we are looking at in the final image.`
			`* up to (but not including) same match preimage.`
			`* plno is where we are looking at in the preimage.`
			`*`
			`* <-------------- final image ---------------------->`
			`* <------ent------>`
			`* ^tlno ^same`
			`* <---------preimage----->`
			`* ^plno`
			`*`
			`* All line numbers are 0-based.`
			`*/`
			`static void handle_split(struct scoreboard *sb,`
			`struct blame_entry *ent,`
			`int tlno, int plno, int same,`
			`struct origin *parent,`
			`struct blame_entry *split)`
			`{`
			`if (ent->num_lines <= tlno)`
			`return;`
			`if (tlno < same) {`
			`struct blame_entry this[3];`
			`tlno += ent->s_lno;`
			`same += ent->s_lno;`
			`split_overlap(this, ent, tlno, plno, same, parent);`
			`copy_split_if_better(sb, split, this);`
			`decref_split(this);`
			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Find the lines from parent that are the same as ent so that`
			`* we can pass blames to it. file_p has the blob contents for`
			`* the parent.`
			`*/`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`static void find_copy_in_blob(struct scoreboard *sb,`
			`struct blame_entry *ent,`
			`struct origin *parent,`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`struct blame_entry *split,`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`mmfile_t *file_p)`
			`{`
			`const char *cp;`
			`int cnt;`
			`mmfile_t file_o;`
			`struct patch *patch;`
			`int i, plno, tlno;`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Prepare mmfile that contains only the lines in ent.`
			`*/`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`cp = nth_line(sb, ent->lno);`
			`file_o.ptr = (char*) cp;`
			`cnt = ent->num_lines;`

			`while (cnt && cp < sb->final_buf + sb->final_buf_size) {`
			`if (*cp++ == '\n')`
			`cnt--;`
			`}`
			`file_o.size = cp - file_o.ptr;`

			`patch = compare_buffer(file_p, &file_o, 1);`

blame: Notice a wholesale incorporation of an existing file. The -C option to blame tries to find a section of a preimage file by running diff against the lines whose origin is still unknown, and excluding the different parts. The code however did not cover the case where the tail part of the section matched, which we handle for the normal non-move/copy codepath. This breakage was most visible when preimage file matches in its entirety and failed to pass blame in such a case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-05 18:13:26 +02:00			`/*`
			`* file_o is a part of final image we are annotating.`
			`* file_p partially may match that image.`
			`*/`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`memset(split, 0, sizeof(struct blame_entry [3]));`
			`plno = tlno = 0;`
			`for (i = 0; i < patch->num; i++) {`
			`struct chunk *chunk = &patch->chunks[i];`

blame: Notice a wholesale incorporation of an existing file. The -C option to blame tries to find a section of a preimage file by running diff against the lines whose origin is still unknown, and excluding the different parts. The code however did not cover the case where the tail part of the section matched, which we handle for the normal non-move/copy codepath. This breakage was most visible when preimage file matches in its entirety and failed to pass blame in such a case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-05 18:13:26 +02:00			`handle_split(sb, ent, tlno, plno, chunk->same, parent, split);`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`plno = chunk->p_next;`
			`tlno = chunk->t_next;`
			`}`
blame: Notice a wholesale incorporation of an existing file. The -C option to blame tries to find a section of a preimage file by running diff against the lines whose origin is still unknown, and excluding the different parts. The code however did not cover the case where the tail part of the section matched, which we handle for the normal non-move/copy codepath. This breakage was most visible when preimage file matches in its entirety and failed to pass blame in such a case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-05 18:13:26 +02:00			`/* remainder, if any, all match the preimage */`
			`handle_split(sb, ent, tlno, plno, ent->num_lines, parent, split);`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`free_patch(patch);`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* See if lines currently target is suspected for can be attributed to`
			`* parent.`
			`*/`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`static int find_move_in_parent(struct scoreboard *sb,`
			`struct origin *target,`
			`struct origin *parent)`
			`{`
git-pickaxe: re-scan the blob after making progress with -M Otherwise we would miss copied lines that are contained in the parts before or after the part that we find after splitting the blame_entry (i.e. split[0] and split[2]). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:37:02 +01:00			`int last_in_target, made_progress;`
git-pickaxe: do not confuse two origins that are the same. It used to be that we can compare the address of the origin structure to determine if they are the same because they are always registered with scoreboard. After introduction of the loop to try finding the best split, that is not true anymore. The current code has rather serious leaks with origin structure, but more importantly it gets confused when two origins that points at the same commit and same path. We might eventually have to refcount and gc origin, but let's fix the correctness issue first. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 09:41:38 +02:00			`struct blame_entry *e, split[3];`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`mmfile_t file_p;`

			`last_in_target = find_last_in_target(sb, target);`
			`if (last_in_target < 0)`
			`return 1; /* nothing remains for this target */`

git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`fill_origin_blob(parent, &file_p);`
			`if (!file_p.ptr)`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`return 0;`

git-pickaxe: re-scan the blob after making progress with -M Otherwise we would miss copied lines that are contained in the parts before or after the part that we find after splitting the blame_entry (i.e. split[0] and split[2]). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:37:02 +01:00			`made_progress = 1;`
			`while (made_progress) {`
			`made_progress = 0;`
			`for (e = sb->ent; e; e = e->next) {`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (e->guilty \|\| !same_suspect(e->suspect, target))`
git-pickaxe: re-scan the blob after making progress with -M Otherwise we would miss copied lines that are contained in the parts before or after the part that we find after splitting the blame_entry (i.e. split[0] and split[2]). Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:37:02 +01:00			`continue;`
			`find_copy_in_blob(sb, e, parent, split, &file_p);`
			`if (split[1].suspect &&`
			`blame_move_score < ent_score(sb, &split[1])) {`
			`split_blame(sb, split, e);`
			`made_progress = 1;`
			`}`
			`decref_split(split);`
			`}`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`}`
			`return 0;`
			`}`

git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`struct blame_list {`
			`struct blame_entry *ent;`
			`struct blame_entry split[3];`
			`};`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Count the number of entries the target is suspected for,`
			`* and prepare a list of entry and the best split.`
			`*/`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`static struct blame_list setup_blame_list(struct scoreboard sb,`
			`struct origin *target,`
			`int *num_ents_p)`
			`{`
			`struct blame_entry *e;`
			`int num_ents, i;`
			`struct blame_list *blame_list = NULL;`

			`for (e = sb->ent, num_ents = 0; e; e = e->next)`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (!e->guilty && same_suspect(e->suspect, target))`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`num_ents++;`
			`if (num_ents) {`
			`blame_list = xcalloc(num_ents, sizeof(struct blame_list));`
			`for (e = sb->ent, i = 0; e; e = e->next)`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (!e->guilty && same_suspect(e->suspect, target))`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`blame_list[i++].ent = e;`
			`}`
			`*num_ents_p = num_ents;`
			`return blame_list;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* For lines target is suspected for, see if we can find code movement`
			`* across file boundary from the parent commit. porigin is the path`
			`* in the parent we already tried.`
			`*/`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`static int find_copy_in_parent(struct scoreboard *sb,`
			`struct origin *target,`
			`struct commit *parent,`
			`struct origin *porigin,`
			`int opt)`
			`{`
			`struct diff_options diff_opts;`
			`const char *paths[1];`
git-pickaxe: swap comparison loop used for -C When assigning blames for code movements across file boundaries, we used to iterate over blame entries (i.e. groups of lines to be blamed) in the outer loop and compared each entry with paths in the parent commit in an inner loop. This meant that we opened the blob data from each path number of times. Reorganize the loop so that we read the same path only once, and compare it against all relevant blame entries. This should perform better, but seems to give mixed results, though. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 12:30:53 +02:00			`int i, j;`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`int retval;`
			`struct blame_list *blame_list;`
git-pickaxe: swap comparison loop used for -C When assigning blames for code movements across file boundaries, we used to iterate over blame entries (i.e. groups of lines to be blamed) in the outer loop and compared each entry with paths in the parent commit in an inner loop. This meant that we opened the blob data from each path number of times. Reorganize the loop so that we read the same path only once, and compare it against all relevant blame entries. This should perform better, but seems to give mixed results, though. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 12:30:53 +02:00			`int num_ents;`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`blame_list = setup_blame_list(sb, target, &num_ents);`
			`if (!blame_list)`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`return 1; /* nothing remains for this target */`

			`diff_setup(&diff_opts);`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`DIFF_OPT_SET(&diff_opts, RECURSIVE);`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;`

			`paths[0] = NULL;`
			`diff_tree_setup_paths(paths, &diff_opts);`
			`if (diff_setup_done(&diff_opts) < 0)`
			`die("diff-setup");`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00
			`/* Try "find copies harder" on new path if requested;`
			`* we do not want to use diffcore_rename() actually to`
			`* match things up; find_copies_harder is set only to`
			`* force diff_tree_sha1() to feed all filepairs to diff_queue,`
			`* and this code needs to be after diff_setup_done(), which`
			`* usually makes find-copies-harder imply copy detection.`
			`*/`
blame: -C -C -C When you do this, existing "blame -C -C" would not find that the latter half of the file2 came from the existing file1: ... both file1 and file2 are tracked ... $ cat file1 >>file2 $ git add file1 file2 $ git commit This is because we avoid the expensive find-copies-harder code that makes unchanged file (in this case, file1) as a candidate for copy & paste source when annotating an existing file (file2). The third -C now allows it. However, this obviously makes the process very expensive. We've actually seen this patch before, but I dismissed it because it covers such a narrow (and arguably stupid) corner case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-06 06:18:57 +02:00			`if ((opt & PICKAXE_BLAME_COPY_HARDEST)`
			`\|\| ((opt & PICKAXE_BLAME_COPY_HARDER)`
			`&& (!porigin \|\| strcmp(target->path, porigin->path))))`
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`DIFF_OPT_SET(&diff_opts, FIND_COPIES_HARDER);`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`if (is_null_sha1(target->commit->object.sha1))`
			`do_diff_cache(parent->tree->object.sha1, &diff_opts);`
			`else`
			`diff_tree_sha1(parent->tree->object.sha1,`
			`target->commit->tree->object.sha1,`
			`"", &diff_opts);`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00
Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-10 20:05:14 +01:00			`if (!DIFF_OPT_TST(&diff_opts, FIND_COPIES_HARDER))`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`diffcore_std(&diff_opts);`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`retval = 0;`
			`while (1) {`
			`int made_progress = 0;`

			`for (i = 0; i < diff_queued_diff.nr; i++) {`
			`struct diff_filepair *p = diff_queued_diff.queue[i];`
			`struct origin *norigin;`
			`mmfile_t file_p;`
			`struct blame_entry this[3];`

			`if (!DIFF_FILE_VALID(p->one))`
			`continue; /* does not exist in parent */`
			`if (porigin && !strcmp(p->one->path, porigin->path))`
			`/* find_move already dealt with this path */`
			`continue;`

			`norigin = get_origin(sb, parent, p->one->path);`
			`hashcpy(norigin->blob_sha1, p->one->sha1);`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`fill_origin_blob(norigin, &file_p);`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`if (!file_p.ptr)`
			`continue;`

			`for (j = 0; j < num_ents; j++) {`
			`find_copy_in_blob(sb, blame_list[j].ent,`
			`norigin, this, &file_p);`
			`copy_split_if_better(sb, blame_list[j].split,`
			`this);`
			`decref_split(this);`
			`}`
			`origin_decref(norigin);`
			`}`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00
git-pickaxe: swap comparison loop used for -C When assigning blames for code movements across file boundaries, we used to iterate over blame entries (i.e. groups of lines to be blamed) in the outer loop and compared each entry with paths in the parent commit in an inner loop. This meant that we opened the blob data from each path number of times. Reorganize the loop so that we read the same path only once, and compare it against all relevant blame entries. This should perform better, but seems to give mixed results, though. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 12:30:53 +02:00			`for (j = 0; j < num_ents; j++) {`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`struct blame_entry *split = blame_list[j].split;`
			`if (split[1].suspect &&`
			`blame_copy_score < ent_score(sb, &split[1])) {`
			`split_blame(sb, split, blame_list[j].ent);`
			`made_progress = 1;`
			`}`
			`decref_split(split);`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`}`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`free(blame_list);`
git-pickaxe: swap comparison loop used for -C When assigning blames for code movements across file boundaries, we used to iterate over blame entries (i.e. groups of lines to be blamed) in the outer loop and compared each entry with paths in the parent commit in an inner loop. This meant that we opened the blob data from each path number of times. Reorganize the loop so that we read the same path only once, and compare it against all relevant blame entries. This should perform better, but seems to give mixed results, though. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 12:30:53 +02:00
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`if (!made_progress)`
			`break;`
			`blame_list = setup_blame_list(sb, target, &num_ents);`
			`if (!blame_list) {`
			`retval = 1;`
			`break;`
			`}`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`}`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`diff_flush(&diff_opts);`
Fix small memory leaks induced by diff_tree_setup_paths Run diff_tree_release_paths in the appropriate places, and add a test to avoid NULL dereference. Better safe than sorry. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-11 22:59:55 +01:00			`diff_tree_release_paths(&diff_opts);`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`return retval;`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The blobs of origin and porigin exactly match, so everything`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`* origin is suspected for can be blamed on the parent.`
			`*/`
			`static void pass_whole_blame(struct scoreboard *sb,`
			`struct origin origin, struct origin porigin)`
			`{`
			`struct blame_entry *e;`

			`if (!porigin->file.ptr && origin->file.ptr) {`
			`/* Steal its file */`
			`porigin->file = origin->file;`
			`origin->file.ptr = NULL;`
			`}`
			`for (e = sb->ent; e; e = e->next) {`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (!same_suspect(e->suspect, origin))`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`continue;`
			`origin_incref(porigin);`
			`origin_decref(e->suspect);`
			`e->suspect = porigin;`
			`}`
			`}`

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`#define MAXPARENT 16`

git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`static void pass_blame(struct scoreboard sb, struct origin origin, int opt)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`{`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`int i, pass;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct commit *commit = origin->commit;`
			`struct commit_list *parent;`
			`struct origin parent_origin[MAXPARENT], porigin;`

			`memset(parent_origin, 0, sizeof(parent_origin));`

git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`/* The first pass looks for unrenamed path to optimize for`
			`* common cases, then we look for renames in the second pass.`
			`*/`
			`for (pass = 0; pass < 2; pass++) {`
			`struct origin (find)(struct scoreboard *,`
			`struct commit , struct origin );`
			`find = pass ? find_rename : find_origin;`

			`for (i = 0, parent = commit->parents;`
			`i < MAXPARENT && parent;`
			`parent = parent->next, i++) {`
			`struct commit *p = parent->item;`
git-pickaxe: simplify Octopus merges further If more than one parents in an Octopus merge have the same origin, ignore later ones because it would not make any difference in the outcome. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:20:09 +01:00			`int j, same;`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00
			`if (parent_origin[i])`
			`continue;`
			`if (parse_commit(p))`
			`continue;`
git-pickaxe: cache one already found path per commit. Depending on how bushy the commit DAG is, this saves calls to the internal diff-tree for fork-point commits. For example, annotating Makefile in the kernel repository saves about a third of such diff-tree calls. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 10:00:01 +01:00			`porigin = find(sb, p, origin);`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`if (!porigin)`
			`continue;`
			`if (!hashcmp(porigin->blob_sha1, origin->blob_sha1)) {`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`pass_whole_blame(sb, origin, porigin);`
git-pickaxe: split find_origin() into find_rename() and find_origin(). When a merge adds a new file from the second parent, the earlier code tried to find renames in the first parent before noticing that the vertion from the second parent was added without modification. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-31 02:17:41 +01:00			`origin_decref(porigin);`
			`goto finish;`
			`}`
git-pickaxe: simplify Octopus merges further If more than one parents in an Octopus merge have the same origin, ignore later ones because it would not make any difference in the outcome. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:20:09 +01:00			`for (j = same = 0; j < i; j++)`
git-pickaxe: re-scan the blob after making progress with -C The reason to do this is the same as in the previous change for line copy detection within the same file (-M). Also this fixes -C and -C -C (aka find-copies-harder) logic; in this application we are not interested in the similarity matching diffcore-rename makes, because we are only interested in scanning files that were modified, or in the case of -C -C, scanning all files in the parent and we want to do that ourselves. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 01:39:03 +01:00			`if (parent_origin[j] &&`
			`!hashcmp(parent_origin[j]->blob_sha1,`
git-pickaxe: simplify Octopus merges further If more than one parents in an Octopus merge have the same origin, ignore later ones because it would not make any difference in the outcome. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-04 21:20:09 +01:00			`porigin->blob_sha1)) {`
			`same = 1;`
			`break;`
			`}`
			`if (!same)`
			`parent_origin[i] = porigin;`
			`else`
			`origin_decref(porigin);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`}`

git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`num_commits++;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`for (i = 0, parent = commit->parents;`
			`i < MAXPARENT && parent;`
			`parent = parent->next, i++) {`
			`struct origin *porigin = parent_origin[i];`
			`if (!porigin)`
			`continue;`
			`if (pass_blame_to_parent(sb, origin, porigin))`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`goto finish;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00
			`/*`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* Optionally find moves in parents' files.`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`*/`
			`if (opt & PICKAXE_BLAME_MOVE)`
			`for (i = 0, parent = commit->parents;`
			`i < MAXPARENT && parent;`
			`parent = parent->next, i++) {`
			`struct origin *porigin = parent_origin[i];`
			`if (!porigin)`
			`continue;`
			`if (find_move_in_parent(sb, origin, porigin))`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`goto finish;`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`}`

git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`/*`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* Optionally find copies from parents' files.`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`*/`
			`if (opt & PICKAXE_BLAME_COPY)`
			`for (i = 0, parent = commit->parents;`
			`i < MAXPARENT && parent;`
			`parent = parent->next, i++) {`
			`struct origin *porigin = parent_origin[i];`
			`if (find_copy_in_parent(sb, origin, parent->item,`
			`porigin, opt))`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`goto finish;`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`}`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00
			`finish:`
blame: drop blob data after passing blame to the parent We used to keep the blob data for each origin that has any remaining line in the result, but this will get very costly with a huge file that has a deep history. This patch releases the blob after we ran diff between the child rev and its parents. When passing blame from a parent to its parent (i.e. the grandparent), the blob data for the parent may need to be read again, but it should be relatively cheap, thanks to delta-base cache. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-12 01:05:50 +01:00			`for (i = 0; i < MAXPARENT; i++) {`
			`if (parent_origin[i]) {`
			`drop_origin_blob(parent_origin[i]);`
			`origin_decref(parent_origin[i]);`
			`}`
			`}`
			`drop_origin_blob(origin);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Information on commits, used for output.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct commit_info`
			`{`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *author;`
			`const char *author_mail;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`unsigned long author_time;`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *author_tz;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`/* filled only when asked for details */`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *committer;`
			`const char *committer_mail;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`unsigned long committer_time;`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *committer_tz;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`const char *summary;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`};`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Parse author/committer line in the commit object buffer`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void get_ac_line(const char inbuf, const char what,`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`int bufsz, char person, const char *mail,`
			`unsigned long time, const char *tz)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`{`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`int len, tzlen, maillen;`
			`char tmp, endp, *timepos;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`tmp = strstr(inbuf, what);`
			`if (!tmp)`
			`goto error_out;`
			`tmp += strlen(what);`
			`endp = strchr(tmp, '\n');`
			`if (!endp)`
			`len = strlen(tmp);`
			`else`
			`len = endp - tmp;`
			`if (bufsz <= len) {`
			`error_out:`
			`/* Ugh */`
General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:17 +01:00			`mail = tz = "(unknown)";`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`*time = 0;`
			`return;`
			`}`
			`memcpy(person, tmp, len);`

			`tmp = person;`
			`tmp += len;`
			`*tmp = 0;`
			`while (*tmp != ' ')`
			`tmp--;`
			`*tz = tmp+1;`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`tzlen = (person+len)-(tmp+1);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`*tmp = 0;`
			`while (*tmp != ' ')`
			`tmp--;`
			`*time = strtoul(tmp, NULL, 10);`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`timepos = tmp;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`*tmp = 0;`
			`while (*tmp != ' ')`
			`tmp--;`
			`*mail = tmp + 1;`
			`*tmp = 0;`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00			`maillen = timepos - tmp;`

			`if (!mailmap.nr)`
			`return;`

			`/*`
			`* mailmap expansion may make the name longer.`
			`* make room by pushing stuff down.`
			`*/`
			`tmp = person + bufsz - (tzlen + 1);`
			`memmove(tmp, *tz, tzlen);`
			`tmp[tzlen] = 0;`
			`*tz = tmp;`

			`tmp = tmp - (maillen + 1);`
			`memmove(tmp, *mail, maillen);`
			`tmp[maillen] = 0;`
			`*mail = tmp;`

			`/*`
			`* Now, convert e-mail using mailmap`
			`*/`
			`map_email(&mailmap, tmp + 1, person, tmp-person-1);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

			`static void get_commit_info(struct commit *commit,`
			`struct commit_info *ret,`
			`int detailed)`
			`{`
			`int len;`
			`char tmp, endp;`
			`static char author_buf[1024];`
			`static char committer_buf[1024];`
			`static char summary_buf[1024];`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We've operated without save_commit_buffer, so`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`* we now need to populate them for output.`
			`*/`
			`if (!commit->buffer) {`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`unsigned long size;`
			`commit->buffer =`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`read_sha1_file(commit->object.sha1, &type, &size);`
blame: check return value from read_sha1_file() Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-25 10:26:20 +02:00			`if (!commit->buffer)`
			`die("Cannot read commit %s",`
			`sha1_to_hex(commit->object.sha1));`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`}`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`ret->author = author_buf;`
			`get_ac_line(commit->buffer, "\nauthor ",`
			`sizeof(author_buf), author_buf, &ret->author_mail,`
			`&ret->author_time, &ret->author_tz);`

			`if (!detailed)`
			`return;`

			`ret->committer = committer_buf;`
			`get_ac_line(commit->buffer, "\ncommitter ",`
			`sizeof(committer_buf), committer_buf, &ret->committer_mail,`
			`&ret->committer_time, &ret->committer_tz);`

			`ret->summary = summary_buf;`
			`tmp = strstr(commit->buffer, "\n\n");`
			`if (!tmp) {`
			`error_out:`
			`sprintf(summary_buf, "(%s)", sha1_to_hex(commit->object.sha1));`
			`return;`
			`}`
			`tmp += 2;`
			`endp = strchr(tmp, '\n');`
			`if (!endp)`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`endp = tmp + strlen(tmp);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`len = endp - tmp;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`if (len >= sizeof(summary_buf) \|\| len == 0)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`goto error_out;`
			`memcpy(summary_buf, tmp, len);`
			`summary_buf[len] = 0;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* To allow LF and other nonportable characters in pathnames,`
			`* they are c-style quoted as needed.`
			`*/`
git-blame --porcelain: quote filename in c-style when needed. Otherwise a pathname that has funny characters such as LF would screw up the parsing programs of the output. Strictly speaking, this is not backward compatible, but the current output for pathnames that have embedded LF and such cannot be sanely parsed anyway, and pathnames that only use characters from the portable pathname character set won't be affected. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:42:31 +01:00			`static void write_filename_info(const char *path)`
			`{`
			`printf("filename ");`
Full rework of quote_c_style and write_name_quoted. * quote_c_style works on a strbuf instead of a wild buffer. * quote_c_style is now clever enough to not add double quotes if not needed. * write_name_quoted inherits those advantages, but also take a different set of arguments. Now instead of asking for quotes or not, you pass a "terminator". If it's \0 then we assume you don't want to escape, else C escaping is performed. In any case, the terminator is also appended to the stream. It also no longer takes the prefix/prefix_len arguments, as it's seldomly used, and makes some optimizations harder. * write_name_quotedpfx is created to work like write_name_quoted and take the prefix/prefix_len arguments. Thanks to those API changes, diff.c has somehow lost weight, thanks to the removal of functions that were wrappers around the old write_name_quoted trying to give it a semantics like the new one, but performing a lot of allocations for this goal. Now we always write directly to the stream, no intermediate allocation is performed. As a side effect of the refactor in builtin-apply.c, the length of the bar graphs in diffstats are not affected anymore by the fact that the path was clipped. Signed-off-by: Pierre Habouzit <madcoder@debian.org> 2007-09-20 00:42:15 +02:00			`write_name_quoted(path, stdout, '\n');`
git-blame --porcelain: quote filename in c-style when needed. Otherwise a pathname that has funny characters such as LF would screw up the parsing programs of the output. Strictly speaking, this is not backward compatible, but the current output for pathnames that have embedded LF and such cannot be sanely parsed anyway, and pathnames that only use characters from the portable pathname character set won't be affected. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:42:31 +01:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The blame_entry is found to be guilty for the range. Mark it`
			`* as such, and show it in incremental output.`
			`*/`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`static void found_guilty_entry(struct blame_entry *ent)`
			`{`
			`if (ent->guilty)`
			`return;`
			`ent->guilty = 1;`
			`if (incremental) {`
			`struct origin *suspect = ent->suspect;`

			`printf("%s %d %d %d\n",`
			`sha1_to_hex(suspect->commit->object.sha1),`
			`ent->s_lno + 1, ent->lno + 1, ent->num_lines);`
			`if (!(suspect->commit->object.flags & METAINFO_SHOWN)) {`
			`struct commit_info ci;`
			`suspect->commit->object.flags \|= METAINFO_SHOWN;`
			`get_commit_info(suspect->commit, &ci, 1);`
			`printf("author %s\n", ci.author);`
			`printf("author-mail %s\n", ci.author_mail);`
			`printf("author-time %lu\n", ci.author_time);`
			`printf("author-tz %s\n", ci.author_tz);`
			`printf("committer %s\n", ci.committer);`
			`printf("committer-mail %s\n", ci.committer_mail);`
			`printf("committer-time %lu\n", ci.committer_time);`
			`printf("committer-tz %s\n", ci.committer_tz);`
			`printf("summary %s\n", ci.summary);`
			`if (suspect->commit->object.flags & UNINTERESTING)`
			`printf("boundary\n");`
			`}`
git-blame --porcelain: quote filename in c-style when needed. Otherwise a pathname that has funny characters such as LF would screw up the parsing programs of the output. Strictly speaking, this is not backward compatible, but the current output for pathnames that have embedded LF and such cannot be sanely parsed anyway, and pathnames that only use characters from the portable pathname character set won't be affected. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:42:31 +01:00			`write_filename_info(suspect->path);`
Don't fflush(stdout) when it's not helpful This patch arose from a discussion started by Jim Meyering's patch whose intention was to provide better diagnostics for failed writes. Linus proposed a better way to do things, which also had the added benefit that adding a fflush() to git-log-* operations and incremental git-blame operations could improve interactive respose time feel, at the cost of making things a bit slower when we aren't piping the output to a downstream program. This patch skips the fflush() calls when stdout is a regular file, or if the environment variable GIT_FLUSH is set to "0". This latter can speed up a command such as: GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD \| wc -l a tiny amount. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-29 19:40:46 +02:00			`maybe_flush_or_die(stdout, "stdout");`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* The main loop -- while the scoreboard has lines whose true origin`
Assorted typo fixes Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-04 05:49:16 +01:00			`* is still unknown, pick one blame_entry, and allow its current`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`* suspect to pass blames to its parents.`
			`*/`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`static void assign_blame(struct scoreboard sb, struct rev_info revs, int opt)`
			`{`
			`while (1) {`
			`struct blame_entry *ent;`
			`struct commit *commit;`
			`struct origin *suspect = NULL;`

			`/* find one suspect to break down */`
			`for (ent = sb->ent; !suspect && ent; ent = ent->next)`
			`if (!ent->guilty)`
			`suspect = ent->suspect;`
			`if (!suspect)`
			`return; /* all done */`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We will use this suspect later in the loop,`
			`* so hold onto it in the meantime.`
			`*/`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`origin_incref(suspect);`
			`commit = suspect->commit;`
			`if (!commit->object.parsed)`
			`parse_commit(commit);`
			`if (!(commit->object.flags & UNINTERESTING) &&`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`!(revs->max_age != -1 && commit->date < revs->max_age))`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`pass_blame(sb, suspect, opt);`
			`else {`
			`commit->object.flags \|= UNINTERESTING;`
			`if (commit->object.parsed)`
			`mark_parents_uninteresting(commit);`
			`}`
			`/* treat root commit as boundary */`
			`if (!commit->parents && !show_root)`
			`commit->object.flags \|= UNINTERESTING;`

			`/* Take responsibility for the remaining entries */`
			`for (ent = sb->ent; ent; ent = ent->next)`
blame: cmp_suspect is not "cmp" anymore. The earlier round makes the function return "is it different" and it does not return a value suitable for sorting anymore. Reverse the logic to return "are they the same suspect" instead, and rename it to "same_suspect()". Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-21 07:37:51 +01:00			`if (same_suspect(ent->suspect, suspect))`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`found_guilty_entry(ent);`
			`origin_decref(suspect);`

			`if (DEBUG) /* sanity */`
			`sanity_check_refcnt(sb);`
			`}`
			`}`

			`static const char format_time(unsigned long time, const char tz_str,`
			`int show_raw_time)`
			`{`
			`static char time_buf[128];`
			`time_t t = time;`
			`int minutes, tz;`
			`struct tm *tm;`

			`if (show_raw_time) {`
			`sprintf(time_buf, "%lu %s", time, tz_str);`
			`return time_buf;`
			`}`

			`tz = atoi(tz_str);`
			`minutes = tz < 0 ? -tz : tz;`
			`minutes = (minutes / 100)*60 + (minutes % 100);`
			`minutes = tz < 0 ? -minutes : minutes;`
			`t = time + minutes * 60;`
			`tm = gmtime(&t);`

			`strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S ", tm);`
			`strcat(time_buf, tz_str);`
			`return time_buf;`
			`}`

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`#define OUTPUT_ANNOTATE_COMPAT 001`
			`#define OUTPUT_LONG_OBJECT_NAME 002`
			`#define OUTPUT_RAW_TIMESTAMP 004`
			`#define OUTPUT_PORCELAIN 010`
			`#define OUTPUT_SHOW_NAME 020`
			`#define OUTPUT_SHOW_NUMBER 040`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`#define OUTPUT_SHOW_SCORE 0100`
blame -s: suppress author name and time. With this "git blame -b -s HEAD~n..HEAD" becomes a nicer way to review the result of recent changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 00:50:45 +02:00			`#define OUTPUT_NO_AUTHOR 0200`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`static void emit_porcelain(struct scoreboard sb, struct blame_entry ent)`
			`{`
			`int cnt;`
			`const char *cp;`
			`struct origin *suspect = ent->suspect;`
			`char hex[41];`

			`strcpy(hex, sha1_to_hex(suspect->commit->object.sha1));`
			`printf("%s%c%d %d %d\n",`
			`hex,`
			`ent->guilty ? ' ' : '*', // purely for debugging`
			`ent->s_lno + 1,`
			`ent->lno + 1,`
			`ent->num_lines);`
			`if (!(suspect->commit->object.flags & METAINFO_SHOWN)) {`
			`struct commit_info ci;`
			`suspect->commit->object.flags \|= METAINFO_SHOWN;`
			`get_commit_info(suspect->commit, &ci, 1);`
			`printf("author %s\n", ci.author);`
			`printf("author-mail %s\n", ci.author_mail);`
			`printf("author-time %lu\n", ci.author_time);`
			`printf("author-tz %s\n", ci.author_tz);`
			`printf("committer %s\n", ci.committer);`
			`printf("committer-mail %s\n", ci.committer_mail);`
			`printf("committer-time %lu\n", ci.committer_time);`
			`printf("committer-tz %s\n", ci.committer_tz);`
git-blame --porcelain: quote filename in c-style when needed. Otherwise a pathname that has funny characters such as LF would screw up the parsing programs of the output. Strictly speaking, this is not backward compatible, but the current output for pathnames that have embedded LF and such cannot be sanely parsed anyway, and pathnames that only use characters from the portable pathname character set won't be affected. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:42:31 +01:00			`write_filename_info(suspect->path);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`printf("summary %s\n", ci.summary);`
git-blame: show lines attributed to boundary commits differently. When blaming with revision ranges, often many lines are attributed to different commits at the boundary, but they are not interesting for the purpose of finding project history during that revision range. This outputs the lines blamed on boundary commits differently. When showing "human format" output, their SHA-1 are shown with '^' prefixed. In "porcelain format", the commit will be shown with an extra attribute line "boundary". Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-02 05:45:45 +01:00			`if (suspect->commit->object.flags & UNINTERESTING)`
			`printf("boundary\n");`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`else if (suspect->commit->object.flags & MORE_THAN_ONE_PATH)`
git-blame --porcelain: quote filename in c-style when needed. Otherwise a pathname that has funny characters such as LF would screw up the parsing programs of the output. Strictly speaking, this is not backward compatible, but the current output for pathnames that have embedded LF and such cannot be sanely parsed anyway, and pathnames that only use characters from the portable pathname character set won't be affected. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:42:31 +01:00			`write_filename_info(suspect->path);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
			`cp = nth_line(sb, ent->lno);`
			`for (cnt = 0; cnt < ent->num_lines; cnt++) {`
			`char ch;`
			`if (cnt)`
			`printf("%s %d %d\n", hex,`
			`ent->s_lno + 1 + cnt,`
			`ent->lno + 1 + cnt);`
			`putchar('\t');`
			`do {`
			`ch = *cp++;`
			`putchar(ch);`
			`} while (ch != '\n' &&`
			`cp < sb->final_buf + sb->final_buf_size);`
			`}`
			`}`

			`static void emit_other(struct scoreboard sb, struct blame_entry ent, int opt)`
			`{`
			`int cnt;`
			`const char *cp;`
			`struct origin *suspect = ent->suspect;`
			`struct commit_info ci;`
			`char hex[41];`
			`int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);`

			`get_commit_info(suspect->commit, &ci, 1);`
			`strcpy(hex, sha1_to_hex(suspect->commit->object.sha1));`

			`cp = nth_line(sb, ent->lno);`
			`for (cnt = 0; cnt < ent->num_lines; cnt++) {`
			`char ch;`
git-blame: show lines attributed to boundary commits differently. When blaming with revision ranges, often many lines are attributed to different commits at the boundary, but they are not interesting for the purpose of finding project history during that revision range. This outputs the lines blamed on boundary commits differently. When showing "human format" output, their SHA-1 are shown with '^' prefixed. In "porcelain format", the commit will be shown with an extra attribute line "boundary". Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-02 05:45:45 +01:00			`int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? 40 : 8;`

			`if (suspect->commit->object.flags & UNINTERESTING) {`
annotate: fix for cvsserver. git-cvsserver does not want the boundary commits shown any differently. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-06 10:52:04 +01:00			`if (blank_boundary)`
			`memset(hex, ' ', length);`
			`else if (!cmd_is_annotate) {`
blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`length--;`
			`putchar('^');`
			`}`
git-blame: show lines attributed to boundary commits differently. When blaming with revision ranges, often many lines are attributed to different commits at the boundary, but they are not interesting for the purpose of finding project history during that revision range. This outputs the lines blamed on boundary commits differently. When showing "human format" output, their SHA-1 are shown with '^' prefixed. In "porcelain format", the commit will be shown with an extra attribute line "boundary". Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-02 05:45:45 +01:00			`}`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-blame: show lines attributed to boundary commits differently. When blaming with revision ranges, often many lines are attributed to different commits at the boundary, but they are not interesting for the purpose of finding project history during that revision range. This outputs the lines blamed on boundary commits differently. When showing "human format" output, their SHA-1 are shown with '^' prefixed. In "porcelain format", the commit will be shown with an extra attribute line "boundary". Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-02 05:45:45 +01:00			`printf("%.*s", length, hex);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (opt & OUTPUT_ANNOTATE_COMPAT)`
			`printf("\t(%10s\t%10s\t%d)", ci.author,`
			`format_time(ci.author_time, ci.author_tz,`
			`show_raw_time),`
			`ent->lno + 1 + cnt);`
			`else {`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`if (opt & OUTPUT_SHOW_SCORE)`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`printf(" %*d %02d",`
			`max_score_digits, ent->score,`
			`ent->suspect->refcnt);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (opt & OUTPUT_SHOW_NAME)`
			`printf(" %-.s", longest_file, longest_file,`
			`suspect->path);`
			`if (opt & OUTPUT_SHOW_NUMBER)`
			`printf(" %*d", max_orig_digits,`
			`ent->s_lno + 1 + cnt);`
blame -s: suppress author name and time. With this "git blame -b -s HEAD~n..HEAD" becomes a nicer way to review the result of recent changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 00:50:45 +02:00
			`if (!(opt & OUTPUT_NO_AUTHOR))`
			`printf(" (%-.s %10s",`
			`longest_author, longest_author,`
			`ci.author,`
			`format_time(ci.author_time,`
			`ci.author_tz,`
			`show_raw_time));`
			`printf(" %*d) ",`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`max_digits, ent->lno + 1 + cnt);`
			`}`
			`do {`
			`ch = *cp++;`
			`putchar(ch);`
			`} while (ch != '\n' &&`
			`cp < sb->final_buf + sb->final_buf_size);`
			`}`
			`}`

			`static void output(struct scoreboard *sb, int option)`
			`{`
			`struct blame_entry *ent;`

			`if (option & OUTPUT_PORCELAIN) {`
			`for (ent = sb->ent; ent; ent = ent->next) {`
			`struct blame_entry *oth;`
			`struct origin *suspect = ent->suspect;`
			`struct commit *commit = suspect->commit;`
			`if (commit->object.flags & MORE_THAN_ONE_PATH)`
			`continue;`
			`for (oth = ent->next; oth; oth = oth->next) {`
			`if ((oth->suspect->commit != commit) \|\|`
			`!strcmp(oth->suspect->path, suspect->path))`
			`continue;`
			`commit->object.flags \|= MORE_THAN_ONE_PATH;`
			`break;`
			`}`
			`}`
			`}`

			`for (ent = sb->ent; ent; ent = ent->next) {`
			`if (option & OUTPUT_PORCELAIN)`
			`emit_porcelain(sb, ent);`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`else {`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`emit_other(sb, ent, option);`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`}`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* To allow quick access to the contents of nth line in the`
			`* final image, prepare an index in the scoreboard.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int prepare_lines(struct scoreboard *sb)`
			`{`
			`const char *buf = sb->final_buf;`
			`unsigned long len = sb->final_buf_size;`
			`int num = 0, incomplete = 0, bol = 1;`

			`if (len && buf[len-1] != '\n')`
			`incomplete++; /* incomplete line at the end */`
			`while (len--) {`
			`if (bol) {`
			`sb->lineno = xrealloc(sb->lineno,`
			`sizeof(int* ) * (num + 1));`
			`sb->lineno[num] = buf - sb->final_buf;`
			`bol = 0;`
			`}`
			`if (*buf++ == '\n') {`
			`num++;`
			`bol = 1;`
			`}`
			`}`
git-pickaxe: fix nth_line() We would want to be able to refer to the end of the file as "the beginning of Nth line" for a file that is N lines long. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 03:48:18 +02:00			`sb->lineno = xrealloc(sb->lineno,`
			`sizeof(int* ) * (num + incomplete + 1));`
			`sb->lineno[num + incomplete] = buf - sb->final_buf;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`sb->num_lines = num + incomplete;`
			`return sb->num_lines;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Add phony grafts for use with -S; this is primarily to`
			`* support git-cvsserver that wants to give a linear history`
			`* to its clients.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int read_ancestry(const char *graft_file)`
			`{`
			`FILE *fp = fopen(graft_file, "r");`
			`char buf[1024];`
			`if (!fp)`
			`return -1;`
			`while (fgets(buf, sizeof(buf), fp)) {`
			`/* The format is just "Commit Parent1 Parent2 ...\n" */`
			`int len = strlen(buf);`
			`struct commit_graft *graft = read_graft_line(buf, len);`
git-annotate: fix -S on graft file with comments. The graft file can contain comment lines and read_graft_line can return NULL for such an input, which should be skipped by the reader. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-10 22:39:01 +01:00			`if (graft)`
			`register_commit_graft(graft, 0);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`fclose(fp);`
			`return 0;`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* How many columns do we need to show line numbers in decimal?`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int lineno_width(int lines)`
			`{`
git-blame: do not indent with spaces. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:15:24 +02:00			`int i, width;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-blame: do not indent with spaces. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:15:24 +02:00			`for (width = 1, i = 10; i <= lines + 1; width++)`
			`i *= 10;`
			`return width;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* How many columns do we need to show line numbers, authors,`
			`* and filenames?`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static void find_alignment(struct scoreboard sb, int option)`
			`{`
			`int longest_src_lines = 0;`
			`int longest_dst_lines = 0;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`unsigned largest_score = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`struct blame_entry *e;`

			`for (e = sb->ent; e; e = e->next) {`
			`struct origin *suspect = e->suspect;`
			`struct commit_info ci;`
			`int num;`

git blame -C: fix output format tweaks when crossing file boundary. We used to get the case that more than two paths came from the same commit wrong when computing the output width and deciding to turn on --show-name option automatically. When we find that lines that came from a path that is different from what we started digging from, we should always turn --show-name on, and we should count the name length for all files involved. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-29 07:29:18 +01:00			`if (strcmp(suspect->path, sb->path))`
			`*option \|= OUTPUT_SHOW_NAME;`
			`num = strlen(suspect->path);`
			`if (longest_file < num)`
			`longest_file = num;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (!(suspect->commit->object.flags & METAINFO_SHOWN)) {`
			`suspect->commit->object.flags \|= METAINFO_SHOWN;`
			`get_commit_info(suspect->commit, &ci, 1);`
			`num = strlen(ci.author);`
			`if (longest_author < num)`
			`longest_author = num;`
			`}`
			`num = e->s_lno + e->num_lines;`
			`if (longest_src_lines < num)`
			`longest_src_lines = num;`
			`num = e->lno + e->num_lines;`
			`if (longest_dst_lines < num)`
			`longest_dst_lines = num;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`if (largest_score < ent_score(sb, e))`
			`largest_score = ent_score(sb, e);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`max_orig_digits = lineno_width(longest_src_lines);`
			`max_digits = lineno_width(longest_dst_lines);`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`max_score_digits = lineno_width(largest_score);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* For debugging -- origin is refcounted, and this asserts that`
			`* we do not underflow.`
			`*/`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`static void sanity_check_refcnt(struct scoreboard *sb)`
			`{`
			`int baa = 0;`
			`struct blame_entry *ent;`

			`for (ent = sb->ent; ent; ent = ent->next) {`
git-pickaxe: tighten sanity checks. When compiled for debugging, make sure that refcnt sanity check code detects underflows in origin reference counting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 23:27:52 +01:00			`/* Nobody should have zero or negative refcnt */`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`if (ent->suspect->refcnt <= 0) {`
			`fprintf(stderr, "%s in %s has negative refcnt %d\n",`
			`ent->suspect->path,`
			`sha1_to_hex(ent->suspect->commit->object.sha1),`
			`ent->suspect->refcnt);`
git-pickaxe: tighten sanity checks. When compiled for debugging, make sure that refcnt sanity check code detects underflows in origin reference counting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 23:27:52 +01:00			`baa = 1;`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`}`
git-pickaxe: tighten sanity checks. When compiled for debugging, make sure that refcnt sanity check code detects underflows in origin reference counting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 23:27:52 +01:00			`}`
			`for (ent = sb->ent; ent; ent = ent->next) {`
			`/* Mark the ones that haven't been checked */`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`if (0 < ent->suspect->refcnt)`
			`ent->suspect->refcnt = -ent->suspect->refcnt;`
			`}`
			`for (ent = sb->ent; ent; ent = ent->next) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* ... then pick each and see if they have the the`
			`* correct refcnt.`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`*/`
			`int found;`
			`struct blame_entry *e;`
			`struct origin *suspect = ent->suspect;`

			`if (0 < suspect->refcnt)`
			`continue;`
git-pickaxe: tighten sanity checks. When compiled for debugging, make sure that refcnt sanity check code detects underflows in origin reference counting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 23:27:52 +01:00			`suspect->refcnt = -suspect->refcnt; /* Unmark */`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`for (found = 0, e = sb->ent; e; e = e->next) {`
			`if (e->suspect != suspect)`
			`continue;`
			`found++;`
			`}`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`if (suspect->refcnt != found) {`
			`fprintf(stderr, "%s in %s has refcnt %d, not %d\n",`
			`ent->suspect->path,`
			`sha1_to_hex(ent->suspect->commit->object.sha1),`
			`ent->suspect->refcnt, found);`
			`baa = 2;`
			`}`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`}`
			`if (baa) {`
			`int opt = 0160;`
			`find_alignment(sb, &opt);`
			`output(sb, opt);`
git-pickaxe: fix origin refcounting When we introduced the cached origin per commit, we gave up proper garbage collecting because it meant that commits hold onto their cached copy. There is no need to do so. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 04:18:50 +01:00			`die("Baa %d!", baa);`
git-pickaxe: WIP to refcount origin structure. The origin structure is allocated for each commit and path while the code traverse down it is copied into different blame entries. To avoid leaks, try refcounting them. This still seems to leak, which I haven't tracked down fully yet. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-29 12:07:40 +01:00			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Used for the command line parsing; check if the path exists`
			`* in the working tree.`
			`*/`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`static int has_path_in_work_tree(const char *path)`
			`{`
			`struct stat st;`
			`return !lstat(path, &st);`
			`}`

git-pickaxe: introduce heuristics to avoid "trivial" chunks This adds scoring logic to blame_entry to prevent blames on very trivial chunks (e.g. lots of empty lines, indent followed by a closing brace) from being passed down to unrelated lines in the parent. The current heuristics are quite simple and may need to be tweaked later, but we need to start somewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 00:37:12 +02:00			`static unsigned parse_score(const char *arg)`
			`{`
			`char *end;`
			`unsigned long score = strtoul(arg, &end, 10);`
			`if (*end)`
			`return 0;`
			`return score;`
			`}`

git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`static const char add_prefix(const char prefix, const char *path)`
			`{`
Make blame accept absolute paths Blame did not always use prefix_path. Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-01 05:07:04 +01:00			`return prefix_path(prefix, prefix ? strlen(prefix) : 0, path);`
git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Parsing of (comma separated) one item in the -L option`
			`*/`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`static const char parse_loc(const char spec,`
			`struct scoreboard *sb, long lno,`
			`long begin, long *ret)`
			`{`
			`char *term;`
			`const char *line;`
			`long num;`
			`int reg_error;`
			`regex_t regexp;`
			`regmatch_t match[1];`

git-pickaxe: allow "-L <something>,+N" With this, git pickaxe -L '/--progress/,+20' v1.4.0 -- pack-objects.c gives you 20 lines starting from the first occurrence of '--progress' in pack-objects, digging from v1.4.0 version. You can also say git pickaxe -L '/--progress/,-5' v1.4.0 -- pack-objects.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-08 01:20:02 +01:00			`/* Allow "-L <something>,+20" to mean starting at <something>`
			`* for 20 lines, or "-L <something>,-5" for 5 lines ending at`
			`* <something>.`
			`*/`
			`if (1 < begin && (spec[0] == '+' \|\| spec[0] == '-')) {`
			`num = strtol(spec + 1, &term, 10);`
			`if (term != spec + 1) {`
			`if (spec[0] == '-')`
			`num = 0 - num;`
			`if (0 < num)`
			`*ret = begin + num - 2;`
			`else if (!num)`
			`*ret = begin;`
			`else`
			`*ret = begin + num;`
			`return term;`
			`}`
			`return spec;`
			`}`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`num = strtol(spec, &term, 10);`
			`if (term != spec) {`
			`*ret = num;`
			`return term;`
			`}`
			`if (spec[0] != '/')`
			`return spec;`

			`/* it could be a regexp of form /.../ */`
			`for (term = (char) spec + 1; term && *term != '/'; term++) {`
			`if (*term == '\\')`
			`term++;`
			`}`
			`if (*term != '/')`
			`return spec;`

			`/* try [spec+1 .. term-1] as regexp */`
			`*term = 0;`
			`begin--; /* input is in human terms */`
			`line = nth_line(sb, begin);`

			`if (!(reg_error = regcomp(&regexp, spec + 1, REG_NEWLINE)) &&`
			`!(reg_error = regexec(&regexp, line, 1, match, 0))) {`
			`const char *cp = line + match[0].rm_so;`
			`const char *nline;`

			`while (begin++ < lno) {`
			`nline = nth_line(sb, begin);`
			`if (line <= cp && cp < nline)`
			`break;`
			`line = nline;`
			`}`
			`*ret = begin;`
			`regfree(&regexp);`
			`*term++ = '/';`
			`return term;`
			`}`
			`else {`
			`char errbuf[1024];`
			`regerror(reg_error, &regexp, errbuf, 1024);`
			`die("-L parameter '%s': %s", spec + 1, errbuf);`
			`}`
			`}`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Parsing of -L option`
			`*/`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`static void prepare_blame_range(struct scoreboard *sb,`
			`const char *bottomtop,`
			`long lno,`
			`long bottom, long top)`
			`{`
			`const char *term;`

			`term = parse_loc(bottomtop, sb, lno, 1, bottom);`
			`if (*term == ',') {`
			`term = parse_loc(term + 1, sb, lno, *bottom + 1, top);`
			`if (*term)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`}`
			`if (*term)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`}`

blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`static int git_blame_config(const char var, const char value)`
			`{`
			`if (!strcmp(var, "blame.showroot")) {`
			`show_root = git_config_bool(var, value);`
			`return 0;`
			`}`
			`if (!strcmp(var, "blame.blankboundary")) {`
			`blank_boundary = git_config_bool(var, value);`
			`return 0;`
			`}`
			`return git_default_config(var, value);`
			`}`

git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`static struct commit fake_working_tree_commit(const char path, const char *contents_from)`
			`{`
			`struct commit *commit;`
			`struct origin *origin;`
			`unsigned char head_sha1[20];`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`struct strbuf buf;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`const char *ident;`
			`time_t now;`
			`int size, len;`
			`struct cache_entry *ce;`
			`unsigned mode;`

			`if (get_sha1("HEAD", head_sha1))`
			`die("No such ref: HEAD");`

			`time(&now);`
			`commit = xcalloc(1, sizeof(*commit));`
			`commit->parents = xcalloc(1, sizeof(*commit->parents));`
			`commit->parents->item = lookup_commit_reference(head_sha1);`
			`commit->object.parsed = 1;`
			`commit->date = now;`
			`commit->object.type = OBJ_COMMIT;`

			`origin = make_origin(commit, path);`

Strbuf API extensions and fixes. * Add strbuf_rtrim to remove trailing spaces. * Add strbuf_insert to insert data at a given position. * Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final \0 so the overflow test for snprintf is the strict comparison. This is not critical as the growth mechanism chosen will always allocate _more_ memory than asked, so the second test will not fail. It's some kind of miracle though. * Add size extension hints for strbuf_init and strbuf_read. If 0, default applies, else: + initial buffer has the given size for strbuf_init. + first growth checks it has at least this size rather than the default 8192. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-10 12:35:04 +02:00			`strbuf_init(&buf, 0);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`if (!contents_from \|\| strcmp("-", contents_from)) {`
			`struct stat st;`
			`const char *read_from;`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`unsigned long fin_size;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00
			`if (contents_from) {`
			`if (stat(contents_from, &st) < 0)`
			`die("Cannot stat %s", contents_from);`
			`read_from = contents_from;`
			`}`
			`else {`
			`if (lstat(path, &st) < 0)`
			`die("Cannot lstat %s", path);`
			`read_from = path;`
			`}`
Cast 64 bit off_t to 32 bit size_t Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4. This implies that we are able to access and work on files whose maximum length is around 2^63-1 bytes, but we can only malloc or mmap somewhat less than 2^32-1 bytes of memory. On such a system an implicit conversion of off_t to size_t can cause the size_t to wrap, resulting in unexpected and exciting behavior. Right now we are working around all gcc warnings generated by the -Wshorten-64-to-32 option by passing the off_t through xsize_t(). In the future we should make xsize_t on such problematic platforms detect the wrapping and die if such a file is accessed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-07 02:44:37 +01:00			`fin_size = xsize_t(st.st_size);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`mode = canon_mode(st.st_mode);`
			`switch (st.st_mode & S_IFMT) {`
			`case S_IFREG:`
strbuf_read_file enhancement, and use it. * make strbuf_read_file take a size hint (works like strbuf_read) * use it in a couple of places. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-27 15:25:55 +02:00			`if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size)`
			`die("cannot open or read %s", read_from);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`break;`
			`case S_IFLNK:`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`if (readlink(read_from, buf.buf, buf.alloc) != fin_size)`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`die("cannot readlink %s", read_from);`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`buf.len = fin_size;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`break;`
			`default:`
			`die("unsupported file type %s", read_from);`
			`}`
			`}`
			`else {`
			`/* Reading from stdin */`
			`contents_from = "standard input";`
			`mode = 0;`
Strbuf API extensions and fixes. * Add strbuf_rtrim to remove trailing spaces. * Add strbuf_insert to insert data at a given position. * Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final \0 so the overflow test for snprintf is the strict comparison. This is not critical as the growth mechanism chosen will always allocate _more_ memory than asked, so the second test will not fail. It's some kind of miracle though. * Add size extension hints for strbuf_init and strbuf_read. If 0, default applies, else: + initial buffer has the given size for strbuf_init. + first growth checks it has at least this size rather than the default 8192. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-10 12:35:04 +02:00			`if (strbuf_read(&buf, 0, 0) < 0)`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`die("read error %s from stdin", strerror(errno));`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`}`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`convert_to_git(path, buf.buf, buf.len, &buf, 0);`
Use strbuf API in apply, blame, commit-tree and diff Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-06 13:20:09 +02:00			`origin->file.ptr = buf.buf;`
			`origin->file.size = buf.len;`
			`pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`commit->util = origin;`

			`/*`
			`* Read the current index, replace the path entry with`
			`* origin->blob_sha1 without mucking with its mode or type`
			`* bits; we are not going to write this index out -- we just`
			`* want to run "diff-index --cached".`
			`*/`
			`discard_cache();`
			`read_cache();`

			`len = strlen(path);`
			`if (!mode) {`
			`int pos = cache_name_pos(path, len);`
			`if (0 <= pos)`
Make on-disk index representation separate from in-core one This converts the index explicitly on read and write to its on-disk format, allowing the in-core format to contain more flags, and be simpler. In particular, the in-core format is now host-endian (as opposed to the on-disk one that is network endian in order to be able to be shared across machines) and as a result we can dispense with all the htonl/ntohl on accesses to the cache_entry fields. This will make it easier to make use of various temporary flags that do not exist in the on-disk format. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2008-01-15 01:03:17 +01:00			`mode = active_cache[pos]->ce_mode;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`else`
			`/* Let's not bother reading from HEAD tree */`
			`mode = S_IFREG \| 0644;`
			`}`
			`size = cache_entry_size(len);`
			`ce = xcalloc(1, size);`
			`hashcpy(ce->sha1, origin->blob_sha1);`
			`memcpy(ce->name, path, len);`
			`ce->ce_flags = create_ce_flags(len, 0);`
			`ce->ce_mode = create_ce_mode(mode);`
			`add_cache_entry(ce, ADD_CACHE_OK_TO_ADD\|ADD_CACHE_OK_TO_REPLACE);`

			`/*`
			`* We are not going to write this out, so this does not matter`
			`* right now, but someday we might optimize diff-index --cached`
			`* with cache-tree information.`
			`*/`
			`cache_tree_invalidate_path(active_cache_tree, path);`

			`commit->buffer = xmalloc(400);`
			`ident = fmt_ident("Not Committed Yet", "not.committed.yet", NULL, 0);`
git-blame: Fix overrun in fake_working_tree_commit() git-blame would overflow commit->buffer when annotating files with long paths. Signed-off-by: Michael Spang <mspang@uwaterloo.ca> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-14 23:26:20 +02:00			`snprintf(commit->buffer, 400,`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`"tree 0000000000000000000000000000000000000000\n"`
			`"parent %s\n"`
			`"author %s\n"`
			`"committer %s\n\n"`
			`"Version of %s from %s\n",`
			`sha1_to_hex(head_sha1),`
			`ident, ident, path, contents_from ? contents_from : path);`
			`return commit;`
			`}`

git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`int cmd_blame(int argc, const char *argv, const char prefix)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`{`
			`struct rev_info revs;`
			`const char *path;`
			`struct scoreboard sb;`
			`struct origin *o;`
			`struct blame_entry *ent;`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`int i, seen_dashdash, unk, opt;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`long bottom, top, lno;`
			`int output_option = 0;`
blame: --show-stats for easier optimization work. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:52:43 +01:00			`int show_stats = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`const char *revs_file = NULL;`
			`const char *final_commit_name = NULL;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`const char *bottomtop = NULL;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`const char *contents_from = NULL;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
annotate: fix for cvsserver. git-cvsserver does not want the boundary commits shown any differently. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-06 10:52:04 +01:00			`cmd_is_annotate = !strcmp(argv[0], "annotate");`

blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`git_config(git_blame_config);`
git-pickaxe: do not keep commit buffer. We need the commit buffer data while generating the final result, but until then we do not need them. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 08:49:31 +02:00			`save_commit_buffer = 0;`

git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`opt = 0;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`seen_dashdash = 0;`
			`for (unk = i = 1; i < argc; i++) {`
			`const char *arg = argv[i];`
			`if (*arg != '-')`
			`break;`
blame: -b (blame.blankboundary) and --root (blame.showroot) When blame.blankboundary is set (or -b option is given), commit object names are blanked out in the "human readable" output format for boundary commits. When blame.showroot is not set (or --root is not given), the root commits are treated as boundary commits. The code still attributes the lines to them, but with -b their object names are not shown. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-12-18 23:04:38 +01:00			`else if (!strcmp("-b", arg))`
			`blank_boundary = 1;`
			`else if (!strcmp("--root", arg))`
			`show_root = 1;`
blame: --show-stats for easier optimization work. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:52:43 +01:00			`else if (!strcmp(arg, "--show-stats"))`
			`show_stats = 1;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`else if (!strcmp("-c", arg))`
			`output_option \|= OUTPUT_ANNOTATE_COMPAT;`
			`else if (!strcmp("-t", arg))`
			`output_option \|= OUTPUT_RAW_TIMESTAMP;`
			`else if (!strcmp("-l", arg))`
			`output_option \|= OUTPUT_LONG_OBJECT_NAME;`
blame -s: suppress author name and time. With this "git blame -b -s HEAD~n..HEAD" becomes a nicer way to review the result of recent changes in context. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 00:50:45 +02:00			`else if (!strcmp("-s", arg))`
			`output_option \|= OUTPUT_NO_AUTHOR;`
git-blame -w: ignore whitespace When refactoring code to split one iteration of a too deeply nested loop into a separate function, it inevitably makes the indentation levels shallower (that's the sole point of such a refactoring). With "git blame -w", you can ignore such re-indentation and pass blame for such moved lines to the parent. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-10 03:14:56 +02:00			`else if (!strcmp("-w", arg))`
			`xdl_opts \|= XDF_IGNORE_WHITESPACE;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`else if (!strcmp("-S", arg) && ++i < argc)`
			`revs_file = argv[i];`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`else if (!prefixcmp(arg, "-M")) {`
git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`opt \|= PICKAXE_BLAME_MOVE;`
git-pickaxe: introduce heuristics to avoid "trivial" chunks This adds scoring logic to blame_entry to prevent blames on very trivial chunks (e.g. lots of empty lines, indent followed by a closing brace) from being passed down to unrelated lines in the parent. The current heuristics are quite simple and may need to be tweaked later, but we need to start somewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 00:37:12 +02:00			`blame_move_score = parse_score(arg+2);`
			`}`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`else if (!prefixcmp(arg, "-C")) {`
blame: -C -C -C When you do this, existing "blame -C -C" would not find that the latter half of the file2 came from the existing file1: ... both file1 and file2 are tracked ... $ cat file1 >>file2 $ git add file1 file2 $ git commit This is because we avoid the expensive find-copies-harder code that makes unchanged file (in this case, file1) as a candidate for copy & paste source when annotating an existing file (file2). The third -C now allows it. However, this obviously makes the process very expensive. We've actually seen this patch before, but I dismissed it because it covers such a narrow (and arguably stupid) corner case. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-06 06:18:57 +02:00			`/*`
			`* -C enables copy from removed files;`
			`* -C -C enables copy from existing files, but only`
			`* when blaming a new file;`
			`* -C -C -C enables copy from existing files for`
			`* everybody`
			`*/`
			`if (opt & PICKAXE_BLAME_COPY_HARDER)`
			`opt \|= PICKAXE_BLAME_COPY_HARDEST;`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`if (opt & PICKAXE_BLAME_COPY)`
			`opt \|= PICKAXE_BLAME_COPY_HARDER;`
			`opt \|= PICKAXE_BLAME_COPY \| PICKAXE_BLAME_MOVE;`
git-pickaxe: introduce heuristics to avoid "trivial" chunks This adds scoring logic to blame_entry to prevent blames on very trivial chunks (e.g. lots of empty lines, indent followed by a closing brace) from being passed down to unrelated lines in the parent. The current heuristics are quite simple and may need to be tweaked later, but we need to start somewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 00:37:12 +02:00			`blame_copy_score = parse_score(arg+2);`
git-pickaxe -C: blame cut-and-pasted lines. This completes the initial round of git-pickaxe. In addition to the detection of line movements we already have, this finds new lines that were created by moving or cutting-and-pasting lines from different files in the parent. With this, git pickaxe -f -n -C v1.4.0 -- revision.c finds that a major part of that file actually came from rev-list.c when Linus split the latter at commit ae563642 and blames them to earlier commits that touch rev-list.c. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:50:17 +02:00			`}`
prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-20 10:54:00 +01:00			`else if (!prefixcmp(arg, "-L")) {`
git-pickaxe: allow -Ln,m as well as -L n,m The command rejects -L1,10 as an invalid line range specifier and I got frustrated enough by it, so this makes it allow both forms of input. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 08:50:38 +01:00			`if (!arg[2]) {`
			`if (++i >= argc)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: allow -Ln,m as well as -L n,m The command rejects -L1,10 as an invalid line range specifier and I got frustrated enough by it, so this makes it allow both forms of input. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-30 08:50:38 +01:00			`arg = argv[i];`
			`}`
			`else`
			`arg += 2;`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`if (bottomtop)`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`die("More than one '-L n,m' option given");`
git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`bottomtop = arg;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`else if (!strcmp("--contents", arg)) {`
			`if (++i >= argc)`
			`usage(blame_usage);`
			`contents_from = argv[i];`
			`}`
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`else if (!strcmp("--incremental", arg))`
			`incremental = 1;`
git-pickaxe: improve "best match" heuristics Instead of comparing number of lines matched, look at the matched characters and count alnums, so that we do not pass blame on not-so-interesting lines, such as an empty line and a line that is indentation followed by a closing brace. Add an option --score-debug to show the score of each blame_entry while we cook this further on the "next" branch. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 23:51:12 +02:00			`else if (!strcmp("--score-debug", arg))`
			`output_option \|= OUTPUT_SHOW_SCORE;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`else if (!strcmp("-f", arg) \|\|`
			`!strcmp("--show-name", arg))`
			`output_option \|= OUTPUT_SHOW_NAME;`
			`else if (!strcmp("-n", arg) \|\|`
			`!strcmp("--show-number", arg))`
			`output_option \|= OUTPUT_SHOW_NUMBER;`
			`else if (!strcmp("-p", arg) \|\|`
			`!strcmp("--porcelain", arg))`
			`output_option \|= OUTPUT_PORCELAIN;`
			`else if (!strcmp("--", arg)) {`
			`seen_dashdash = 1;`
			`i++;`
			`break;`
			`}`
			`else`
			`argv[unk++] = arg;`
			`}`

git-pickaxe: introduce heuristics to avoid "trivial" chunks This adds scoring logic to blame_entry to prevent blames on very trivial chunks (e.g. lots of empty lines, indent followed by a closing brace) from being passed down to unrelated lines in the parent. The current heuristics are quite simple and may need to be tweaked later, but we need to start somewhere. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-21 00:37:12 +02:00			`if (!blame_move_score)`
			`blame_move_score = BLAME_DEFAULT_MOVE_SCORE;`
			`if (!blame_copy_score)`
			`blame_copy_score = BLAME_DEFAULT_COPY_SCORE;`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* We have collected options unknown to us in argv[1..unk]`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`* which are to be passed to revision machinery if we are`
Assorted typo fixes Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-04 05:49:16 +01:00			`* going to do the "bottom" processing.`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`*`
			`* The remaining are:`
			`*`
			`* (1) if seen_dashdash, its either`
			`* "-options -- <path>" or`
			`* "-options -- <path> <rev>".`
			`* but the latter is allowed only if there is no`
			`* options that we passed to revision machinery.`
			`*`
			`* (2) otherwise, we may have "--" somewhere later and`
			`* might be looking at the first one of multiple 'rev'`
			`* parameters (e.g. " master ^next ^maint -- path").`
			`* See if there is a dashdash first, and give the`
			`* arguments before that to revision machinery.`
			`* After that there must be one 'path'.`
			`*`
			`* (3) otherwise, its one of the three:`
			`* "-options <path> <rev>"`
			`* "-options <rev> <path>"`
			`* "-options <path>"`
			`* but again the first one is allowed only if`
			`* there is no options that we passed to revision`
			`* machinery.`
			`*/`

			`if (seen_dashdash) {`
			`/* (1) */`
			`if (argc <= i)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`path = add_prefix(prefix, argv[i]);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (i + 1 == argc - 1) {`
			`if (unk != 1)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`argv[unk++] = argv[i + 1];`
			`}`
			`else if (i + 1 != argc)`
			`/* garbage at end */`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
			`else {`
			`int j;`
			`for (j = i; !seen_dashdash && j < argc; j++)`
			`if (!strcmp(argv[j], "--"))`
			`seen_dashdash = j;`
			`if (seen_dashdash) {`
git-blame: prevent argument parsing segfault The 3rd branch in builtin-blame.c should also check for lacking arguments. Running that in top dir does not trigger the problem because the 'prefix' is NULL. Signed-off-by: Tommi Kyntola <tommi.kyntola@ray.fi> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-16 09:50:58 +01:00			`/* (2) */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (seen_dashdash + 1 != argc - 1)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage);`
git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`path = add_prefix(prefix, argv[seen_dashdash + 1]);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`for (j = i; j < seen_dashdash; j++)`
			`argv[unk++] = argv[j];`
			`}`
			`else {`
			`/* (3) */`
git-blame: prevent argument parsing segfault The 3rd branch in builtin-blame.c should also check for lacking arguments. Running that in top dir does not trigger the problem because the 'prefix' is NULL. Signed-off-by: Tommi Kyntola <tommi.kyntola@ray.fi> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-16 09:50:58 +01:00			`if (argc <= i)`
			`usage(blame_usage);`
git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`path = add_prefix(prefix, argv[i]);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (i + 1 == argc - 1) {`
			`final_commit_name = argv[i + 1];`

			`/* if (unk == 1) we could be getting`
			`* old-style`
			`*/`
			`if (unk == 1 && !has_path_in_work_tree(path)) {`
git-pickaxe: work properly in a subdirectory. We forgot to add prefix to the given path. [jc: interestingly enough, Jeff King had the same idea after I pushed mine out to "pu", and his patch was cleaner, so I dropped mine.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-02 08:22:49 +01:00			`path = add_prefix(prefix, argv[i + 1]);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`final_commit_name = argv[i];`
			`}`
			`}`
			`else if (i != argc - 1)`
git-pickaxe: retire pickaxe Just make it take over blame's place. Documentation and command have all stopped mentioning "git-pickaxe". The built-in synonym is left in the command table, so you can still say "git pickaxe", but it probably is a good idea to retire it as well. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-09 03:47:54 +01:00			`usage(blame_usage); /* garbage at end */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
builtin-blame: set up the work_tree before the first file access We check in cmd_blame() if the specified path is there, but we failed to set up the working tree before that. While at it, make setup_work_tree() just return if it was run before. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-09 12:34:07 +01:00			`setup_work_tree();`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (!has_path_in_work_tree(path))`
			`die("cannot stat path %s: %s",`
			`path, strerror(errno));`
			`}`
			`}`

			`if (final_commit_name)`
			`argv[unk++] = final_commit_name;`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* Now we got rev and path. We do not want the path pruning`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`* but we may want "bottom" processing.`
			`*/`
git-blame: fix rev parameter handling. We lacked "--" termination in the underlying init_revisions() call which made it impossible to specify a revision that happens to have the same name as an existing file. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-15 22:52:25 +01:00			`argv[unk++] = "--"; /* terminate the rev name */`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`argv[unk] = NULL;`

			`init_revisions(&revs, NULL);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`setup_revisions(unk, argv, &revs, NULL);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`memset(&sb, 0, sizeof(sb));`

git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* There must be one and only one positive commit in the`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`* revs->pending array.`
			`*/`
			`for (i = 0; i < revs.pending.nr; i++) {`
			`struct object *obj = revs.pending.objects[i].item;`
			`if (obj->flags & UNINTERESTING)`
			`continue;`
			`while (obj->type == OBJ_TAG)`
			`obj = deref_tag(obj, NULL, 0);`
			`if (obj->type != OBJ_COMMIT)`
			`die("Non commit %s?",`
			`revs.pending.objects[i].name);`
			`if (sb.final)`
			`die("More than one commit to dig from %s and %s?",`
			`revs.pending.objects[i].name,`
			`final_commit_name);`
			`sb.final = (struct commit *) obj;`
			`final_commit_name = revs.pending.objects[i].name;`
			`}`

			`if (!sb.final) {`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* "--not A B -- path" without anything positive;`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`* do not default to HEAD, but use the working tree`
			`* or "--contents".`
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`*/`
Make git-blame fail when working tree is needed and we're not in one Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-03 13:22:55 +01:00			`setup_work_tree();`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`sb.final = fake_working_tree_commit(path, contents_from);`
			`add_pending_object(&revs, &(sb.final->object), ":");`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`}`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`else if (contents_from)`
			`die("Cannot use --contents with final commit object name");`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-blame: somewhat better commenting. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 02:36:22 +01:00			`/*`
			`* If we have bottom, this will mark the ancestors of the`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`* bottom commits we would reach while traversing as`
			`* uninteresting.`
			`*/`
check return code of prepare_revision_walk A failure in prepare_revision_walk can be caused by a not parseable object. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-18 08:31:56 +01:00			`if (prepare_revision_walk(&revs))`
			`die("revision walk setup failed");`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`if (is_null_sha1(sb.final->object.sha1)) {`
			`char *buf;`
			`o = sb.final->util;`
			`buf = xmalloc(o->file.size + 1);`
			`memcpy(buf, o->file.ptr, o->file.size + 1);`
			`sb.final_buf = buf;`
			`sb.final_buf_size = o->file.size;`
			`}`
			`else {`
			`o = get_origin(&sb, sb.final, path);`
			`if (fill_blob_sha1(o))`
			`die("no such path %s in %s", path, final_commit_name);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`sb.final_buf = read_sha1_file(o->blob_sha1, &type,`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`&sb.final_buf_size);`
blame: check return value from read_sha1_file() Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-08-25 10:26:20 +02:00			`if (!sb.final_buf)`
			`die("Cannot read blob %s for path %s",`
			`sha1_to_hex(o->blob_sha1),`
			`path);`
git-blame: no rev means start from the working tree file. Warning: this changes the semantics. This makes "git blame" without any positive rev to start digging from the working tree copy, which is made into a fake commit whose sole parent is the HEAD. It also adds --contents <file> option to pretend as if the working tree copy has the contents of the named file. You can use '-' to make the command read from the standard input. If you want the command to start annotating from the HEAD commit, you need to explicitly give HEAD parameter. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-30 10:11:08 +01:00			`}`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`num_read_blob++;`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`lno = prepare_lines(&sb);`

git-pickaxe: -L /regexp/,/regexp/ With this change, you can specify the beginning and the ending line of the range you wish to inspect with pattern matching. For example, these are equivalent with the git.git sources: git pickaxe -L 7,21 v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,/^}/' v1.4.0 -- commit.c git pickaxe -L '7,/^}/' v1.4.0 -- commit.c git pickaxe -L '/^struct sort_node/,21' v1.4.0 -- commit.c Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-07 02:08:32 +01:00			`bottom = top = 0;`
			`if (bottomtop)`
			`prepare_blame_range(&sb, bottomtop, lno, &bottom, &top);`
			`if (bottom && top && top < bottom) {`
			`long tmp;`
			`tmp = top; top = bottom; bottom = tmp;`
			`}`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`if (bottom < 1)`
			`bottom = 1;`
			`if (top < 1)`
			`top = lno;`
			`bottom--;`
			`if (lno < top)`
			`die("file %s has only %lu lines", path, lno);`

			`ent = xcalloc(1, sizeof(*ent));`
			`ent->lno = bottom;`
			`ent->num_lines = top - bottom;`
			`ent->suspect = o;`
			`ent->s_lno = bottom;`

			`sb.ent = ent;`
			`sb.path = path;`

			`if (revs_file && read_ancestry(revs_file))`
			`die("reading graft file %s failed: %s",`
			`revs_file, strerror(errno));`

blame: use .mailmap unconditionally There really isn't any point in turning off .mailmap. The number of mailmap lookups are bounded by number of lines in the target file, and the real blame processing is much more expensive. If it turns out to be too costly, we should optimize the mailmap lookup itself, instead of avoiding the call. If the author information of commits of the project are relatively clean, .mailmap would have only small number of entries, and the overhead of looking it up will not be high. On the other hand, if the author information is really screwed up that a good .mailmap needs to be maintained to run shortlog, giving uncleaned names in blame output is not helpful at all either. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-03 08:58:14 +02:00			`read_mailmap(&mailmap, ".mailmap", NULL);`
Apply mailmap in git-blame output. This makes git-blame to use the same mailmap used by git-shortlog. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-27 09:42:15 +02:00
Delay pager setup in git blame This avoids to launch the pager when git blame fails for any reason. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-03 13:22:53 +01:00			`if (!incremental)`
			`setup_pager();`

git-pickaxe -M: blame line movements within a file. This makes pickaxe more intelligent than the classic blame. A typical example is a change that moves one static C function from lower part of the file to upper part of the same file, because you added a new caller in the middle. The versions in the parent and the child would look like this: parent child A static foo() { B ... C } D A E B F C G D static foo() { ... call foo(); ... E } F H G H With the classic blame algorithm, we can blame lines A B C D E F G and H to the parent. The child is guilty of introducing the line "... call foo();", and the blame is placed on the child. However, the classic blame algorithm fails to notice that the implementation of foo() at the top of the file is not new, and moved from the lower part of the parent. This commit introduces detection of such line movements, and correctly blames the lines that were simply moved in the file to the parent. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 03:49:30 +02:00			`assign_blame(&sb, &revs, opt);`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00
git-blame --incremental This adds --incremental option to help GUI porcelains to show the result from git-blame incrementally. The output gives the origin information in the same format as the porcelain format. The first line has commit object name, the line number of the first line in the group in the original file, the line number of that file in the final image, and number of lines in the group. Then subsequent lines show the metainformation for the commit when the commit is shown for the first time, except the filename information is always shown (we cannot even make it conditional to -C option as blame always follows the renaming of the file wholesale). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-01-28 10:34:06 +01:00			`if (incremental)`
			`return 0;`

git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`coalesce(&sb);`

			`if (!(output_option & OUTPUT_PORCELAIN))`
			`find_alignment(&sb, &output_option);`

			`output(&sb, output_option);`
			`free((void *)sb.final_buf);`
			`for (ent = sb.ent; ent; ) {`
			`struct blame_entry *e = ent->next;`
			`free(ent);`
			`ent = e;`
			`}`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00
blame: --show-stats for easier optimization work. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:52:43 +01:00			`if (show_stats) {`
git-pickaxe: optimize by avoiding repeated read_sha1_file(). It turns out that pickaxe reads the same blob repeatedly while blame can reuse the blob already read for the parent when handling a child commit when it's parent's turn to pass its blame to the grandparent. Have a cache in the origin structure to keep the blob there, which will be garbage collected when the origin loses the last reference to it. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-05 20:51:41 +01:00			`printf("num read blob: %d\n", num_read_blob);`
			`printf("num get patch: %d\n", num_get_patch);`
			`printf("num commits: %d\n", num_commits);`
			`}`
git-pickaxe: blame rewritten. Currently it does what git-blame does, but only faster. More importantly, its internal structure is designed to support content movement (aka cut-and-paste) more easily by allowing more than one paths to be taken from the same commit. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-10-20 01:00:04 +02:00			`return 0;`
			`}`