mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-05 08:47:56 +01:00

Author	SHA1	Message	Date
Junio C Hamano	8ae92e6389	rename diff_free_filespec_data_large() to diff_free_filespec_blob() Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-10-02 21:02:09 -07:00
Jeff King	eede7b7d11	diffcore-rename: cache file deltas We find rename candidates by computing a fingerprint hash of each file, and then comparing those fingerprints. There are inherently O(n^2) comparisons, so it pays in CPU time to hoist the (rather expensive) computation of the fingerprint out of that loop (or to cache it once we have computed it once). Previously, we didn't keep the filespec information around because then we had the potential to consume a great deal of memory. However, instead of keeping all of the filespec data, we can instead just keep the fingerprint. This patch implements and uses diff_free_filespec_data_large to accomplish that goal. We also have to change estimate_similarity not to needlessly repopulate the filespec data when we already have the hash. Practical tests showed 4.5x speedup for a 10% memory usage increase. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-10-02 21:02:03 -07:00
Linus Torvalds	0024a54923	Fix the rename detection limit checking This adds more proper rename detection limits. Instead of just checking the limit against the number of potential rename destinations, we verify that the rename matrix (which is what really matters) doesn't grow ridiculously large, and we also make sure that we don't overflow when doing the matrix size calculation. This also changes the default limits from unlimited, to a rename matrix that is limited to 100 entries on a side. You can raise it with the config entry, or by using the "-l<n>" command line flag, but at least the default is now a sane number that avoids spending lots of time (and memory) in situations that likely don't merit it. The choice of default value is of course very debatable. Limiting the rename matrix to a 100x100 size will mean that even if you have just one obvious rename, but you also create (or delete) 10,000 files, the rename matrix will be so big that we disable the heuristics. Sounds reasonable to me, but let's see if people hit this (and, perhaps more importantly, actually care) in real life. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-14 12:12:57 -07:00
Junio C Hamano	e1bc8dc66d	Merge branch 'jc/diffcore' * jc/diffcore: diffcore-delta.c: Ignore CR in CRLF for text files diffcore-delta.c: update the comment on the algorithm. diffcore_filespec: add is_binary diffcore_count_changes: pass diffcore_filespec	2007-07-02 01:45:12 -07:00
Junio C Hamano	d8c3d03a0b	diffcore_count_changes: pass diffcore_filespec We may want to use richer information on the data we are dealing with in this function, so instead of passing a buffer address and length, just pass the diffcore_filespec structure. Existing callers always call this function with parameters taken from a filespec anyway, so there is no functionality changes. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-06-30 20:51:31 -07:00
René Scharfe	cfc0aef1ff	diffcore-rename: don't change similarity index based on basename equality This implements a suggestion from Johannes. It uses a separate field in struct diff_score to keep the result of the file name comparison in the rename detection logic. This reverts the value of the similarity index to be a function of file contents, only, and basename comparison is only used to decide between files with equal amounts of content changes. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-06-24 23:12:31 -07:00
Johannes Schindelin	0ce396431e	diffcore-rename: favour identical basenames When there are several candidates for a rename source, and one of them has an identical basename to the rename target, take that one. Noticed by Govind Salinas, posted by Shawn O. Pearce, partial patch by Linus Torvalds. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-06-22 22:43:51 -07:00
Junio C Hamano	50b2b53897	diff -M: release the preimage candidate blobs after rename detection. We released the postimage candidate blobs after we are done to reduce memory pressure. Do the same for preimage candidate blobs. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-07 15:54:32 -07:00
Shawn O. Pearce	dc49cd769b	Cast 64 bit off_t to 32 bit size_t Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4. This implies that we are able to access and work on files whose maximum length is around 2^63-1 bytes, but we can only malloc or mmap somewhat less than 2^32-1 bytes of memory. On such a system an implicit conversion of off_t to size_t can cause the size_t to wrap, resulting in unexpected and exciting behavior. Right now we are working around all gcc warnings generated by the -Wshorten-64-to-32 option by passing the off_t through xsize_t(). In the future we should make xsize_t on such problematic platforms detect the wrapping and die if such a file is accessed. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-07 11:15:26 -08:00
Shawn O. Pearce	7da41f48c8	Bypass expensive content comparsion during rename detection. When comparing file contents during the second loop through a rename detection attempt we can skip the expensive byte-by-byte comparsion if both source and destination files have valid SHA1 values. This improves performance by avoiding either an expensive open/mmap to read the working tree copy, or an expensive inflate of a blob object. Unfortunately we still have to at least initialize the sizes of the source and destination files even if the SHA1 values don't match. Failing to initialize the sizes causes a number of test cases to fail and start reporting different copy/rename behavior than was expected. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-12-14 02:40:33 -08:00
Junio C Hamano	2f3f8b218a	git-pickaxe: rename detection optimization The idea is that we are interested in renaming into only one path, so we do not care about renames that happen elsewhere. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-11-04 12:18:12 -08:00
David Rientjes	a89fccd281	Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-08-17 14:23:53 -07:00
Junio C Hamano	ef677686ef	diff.c: do not use pathname comparison to tell renames The final output from diff used to compare pathnames between preimage and postimage to tell if the filepair is a rename/copy. By explicitly marking the filepair created by diffcore_rename(), the output routine, resolve_rename_copy(), does not have to do so anymore. This helps feeding a filepair that has different pathnames in one and two elements to the diff machinery (most notably, comparing two blobs). Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-08-03 14:41:53 -07:00
Junio C Hamano	17e6019a2a	diffcore-rename: try matching up renames without populating filespec first. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-07-06 17:03:52 -07:00
Junio C Hamano	fc5807190e	diffcore-rename: fix merging back a broken pair. When a broken pair is matched up by rename detector to be merged back, we do not want to say it is "dissimilar" with the similarity index. The output should just say they were changed, taking the break score left by the earlier diffcore-break run if any. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-04-08 20:32:41 -07:00
Linus Torvalds	90bd932c81	Fix up diffcore-rename scoring The "score" calculation for diffcore-rename was totally broken. It scaled "score" as score = src_copied * MAX_SCORE / dst->size; which means that you got a 100% similarity score even if src and dest were different, if just every byte of dst was copied from src, even if source was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ the remaining 15%). That's clearly bogus. We should do the score calculation relative not to the destination size, but to the max size of the two. This seems to fix it. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-03-12 23:02:00 -08:00
Junio C Hamano	2821104db7	diffcore-delta: make the hash a bit denser. To reduce wasted memory, wait until the hash fills up more densely before we rehash. This reduces the working set size a bit further. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-03-12 17:26:32 -08:00
Junio C Hamano	c06c79667c	diffcore-rename: somewhat optimized. This changes diffcore-rename to reuse statistics information gathered during similarity estimation, and updates the hashtable implementation used to keep track of the statistics to be denser. This seems to give better performance. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-03-12 03:22:10 -08:00
Junio C Hamano	1706306a54	diffcore-rename: similarity estimator fix. The "similarity" logic was giving added material way too much negative weight. What we wanted to see was how similar the post-change image was compared to the pre-change image, so the natural definition of similarity is how much common things are there, relative to the post-change image's size. This simplifies things a lot. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-03-02 22:12:33 -08:00
Junio C Hamano	65416758cd	diffcore-rename: split out the delta counting code. This is to rework diffcore break/rename/copy detection code so that it does not affected when deltifier code gets improved. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-02-28 20:20:04 -08:00
Junio C Hamano	09a5d72d8e	diffcore-rename: plug memory leak. Spotted by Nicolas Pitre. Signed-off-by: Junio C Hamano <junkio@cox.net>	2006-02-22 19:45:48 -08:00
Eric Wong	7d6fb370bc	short circuit out of a few places where we would allocate zero bytes dietlibc versions of malloc, calloc and realloc all return NULL if they're told to allocate 0 bytes, causes the x* wrappers to die(). There are several more places where these calls could end up asking for 0 bytes, too... Maybe simply not die()-ing in the x* wrappers if 0/NULL is returned when the requested size is zero is a safer and easier way to go. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-12-26 08:59:21 -08:00
Junio C Hamano	9f70b80692	rename detection with -M100 means "exact renames only". When the user is interested in pure renames, there is no point doing the similarity scores. This changes the score argument parsing to special case -M100 (otherwise, it is a precision scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you do mean 0.1, you can say -M1), and optimizes the diffcore_rename transformation to only look at pure renames in that case. Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-11-21 12:21:24 -08:00
Junio C Hamano	3299c6f6a8	diff: make default rename detection limit configurable. A while ago, a rename-detection limit logic was implemented as a response to this thread: http://marc.theaimsgroup.com/?l=git&m=112413080630175 where gitweb was found to be using a lot of time and memory to detect renames on huge commits. git-diff family takes -l<num> flag, and if the number of paths that are rename destination candidates (i.e. new paths with -M, or modified paths with -C) are larger than that number, skips rename/copy detection even when -M or -C is specified on the command line. This commit makes the rename detection limit easier to use. You can have: [diff] renamelimit = 30 in your .git/config file to specify the default rename detection limit. You can override this from the command line; giving 0 means 'unlimited': git diff -M -l0 We might want to change the default behaviour, when you do not have the configuration, to limit it to say 20 paths or so. This would also help the diffstat generation after a big 'git pull'. Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-11-15 15:08:27 -08:00
Junio C Hamano	8082d8d305	Diff: -l<num> to limit rename/copy detection. When many paths are modified, rename detection takes a lot of time. The new option -l<num> can be used to disable rename detection when more than <num> paths are possibly created as renames. Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-09-24 23:50:44 -07:00
Junio C Hamano	5098bafb75	Plug diff leaks. It is a bit embarrassing that it took this long for a fix since the problem was first reported on Aug 13th. Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com> From: Yasushi SHOJI <yashi@atmark-techno.com> Newsgroups: gmane.comp.version-control.git Subject: [patch] possible memory leak in diff.c::diff_free_filepair() Date: Sat, 13 Aug 2005 19:58:56 +0900 This time I used valgrind to make sure that it does not overeagerly discard memory that is still being used. Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-09-15 16:13:43 -07:00
Junio C Hamano	6bac10d89d	Fix copy marking from diffcore-rename. When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said that C was created by copying B. This is because we only check if the path of rename/copy source still exists in the resulting tree to see if the file is renamed out of existence. In this case, the new B is created by copying or renaming A, so the original B is lost and we should say C is a rename of B not a copy of B. Signed-off-by: Junio C Hamano <junkio@cox.net>	2005-09-10 12:42:32 -07:00
Junio C Hamano	75c660ac93	[PATCH] Use enhanced diff_delta() in the similarity estimator. The diff_delta() interface was extended to reject generating too big a delta while we were working on the packed GIT archive format. Take advantage of that when generating delta in the similarity estimator used in diffcore-rename.c Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-28 17:13:32 -07:00
Linus Torvalds	75c42d8cc3	Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.	2005-06-25 19:30:20 -07:00
Junio C Hamano	2210100ac0	[PATCH] Fix rename/copy when dealing with temporarily broken pairs. When rename/copy uses a file that was broken by diffcore-break as the source, and the broken filepair gets merged back later, the output was mislabeled as a rename. In this case, the source file ends up staying in the output, so we should label it as a copy instead. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-12 20:40:19 -07:00
Junio C Hamano	0e3994fa97	[PATCH] diff: Clean up diff_scoreopt_parse(). This cleans up diff_scoreopt_parse() function that is used to parse the fractional notation -B, -C and -M option takes. The callers are modified to check for errors and complain. Earlier they silently ignored malformed input and falled back on the default. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-03 11:23:03 -07:00
Junio C Hamano	355e76a4a3	[PATCH] Tweak count-delta interface Make it return copied source and insertion separately, so that later implementation of heuristics can use them more flexibly. This does not change the heuristics implemented in diffcore-rename nor diffcore-break in any way. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-03 11:23:03 -07:00
Junio C Hamano	f345b0a066	[PATCH] Add -B flag to diff-* brothers. A new diffcore transformation, diffcore-break.c, is introduced. When the -B flag is given, a patch that represents a complete rewrite is broken into a deletion followed by a creation. This makes it easier to review such a complete rewrite patch. The -B flag takes the same syntax as the -M and -C flags to specify the minimum amount of non-source material the resulting file needs to have to be considered a complete rewrite, and defaults to 99% if not specified. As the new test t4008-diff-break-rewrite.sh demonstrates, if a file is a complete rewrite, it is broken into a delete/create pair, which can further be subjected to the usual rename detection if -M or -C is used. For example, if file0 gets completely rewritten to make it as if it were rather based on file1 which itself disappeared, the following happens: The original change looks like this: file0 --> file0' (quite different from file0) file1 --> /dev/null After diffcore-break runs, it would become this: file0 --> /dev/null /dev/null --> file0' file1 --> /dev/null Then diffcore-rename matches them up: file1 --> file0' The internal score values are finer grained now. Earlier maximum of 10000 has been raised to 60000; there is no user visible changes but there is no reason to waste available bits. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-30 10:35:49 -07:00
Junio C Hamano	2cd68882ee	[PATCH] diff: fix the culling of unneeded delete record. The commit `15d061b435` [PATCH] Fix the way diffcore-rename records unremoved source. still leaves unneeded delete records in its output stream by mistake, which was covered up by having an extra check to turn such a delete into a no-op downstream. Fix the check in the diffcore-rename to simplify the output routine. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-30 10:35:49 -07:00
Junio C Hamano	01c4e70f63	[PATCH] diff: code clean-up and removal of rename hack. A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a filepair that is a rename/copy (the definition of which is src and dst are different paths, of course). This removes the hack used in the record_rename_pair() to always put a non-zero value in the score field. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-30 10:35:49 -07:00
Junio C Hamano	f0c6b2a2fd	[PATCH] Optimize diff-tree -[CM] --stdin This attempts to optimize "diff-tree -[CM] --stdin", which compares successible tree pairs. This optimization does not make much sense for other commands in the diff-* brothers. When reading from --stdin and using rename/copy detection, the patch makes diff-tree to read the current index file first. This is done to reuse the optimization used by diff-cache in the non-cached case. Similarity estimator can avoid expanding a blob if the index says what is in the work tree has an exact copy of that blob already expanded. Another optimization the patch makes is to check only file sizes first to terminate similarity estimation early. In order for this to work, it needs a way to tell the size of the blob without expanding it. Since an obvious way of doing it, which is to keep all the blobs previously used in the memory, is too costly, it does so by keeping the filesize for each object it has already seen in memory. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-29 11:17:44 -07:00
Junio C Hamano	15d061b435	[PATCH] Fix the way diffcore-rename records unremoved source. Earier version of diffcore-rename used to keep unmodified filepair in its output so that the last stage of the processing that tells renames from copies can make all of rename/copy to copies. However this had a bad interaction with other diffcore filters that wanted to run after diffcore-rename, in that such unmodified filepair must be retained for proper distinction between renames and copies to happen. This patch fixes the problem by changing the way diffcore-rename records the information needed to distinguish "all are copies" case and "the last one is a rename" case. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-29 11:17:43 -07:00
Junio C Hamano	1a0756ffe4	[PATCH] Remove unused rank field from diff_core structure. This removes a field that is no longer used from diff_score structure. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-29 11:17:43 -07:00
Junio C Hamano	226406f693	[PATCH] Introduce diff_free_filepair() funcion. This introduces a new function to free a common data structure, and plugs some leaks. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-29 11:17:43 -07:00
Junio C Hamano	a00d7d106a	[PATCH] Fix math thinko in similarity estimator. The math to reject delta that is too big was confused. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-29 11:17:43 -07:00
Junio C Hamano	8597697458	[PATCH] Update rename/copy similarity estimator. The second round similarity estimator simply used the size of the xdelta itself to estimate the extent of damage. This patch keeps that logic to detect big insertions to terminate the check early, but otherwise looks at the generated delta in order to estimate the extent of edit more accurately. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-24 17:47:05 -07:00
Junio C Hamano	25d5ea410f	[PATCH] Redo rename/copy detection logic. Earlier implementation had a major screw-up in the memory management area. Rename/copy logic sometimes borrowed a pointer to a structure without any provision for downstream to determine which pointer is shared and which is not. This resulted in the later clean-up code to sometimes double free such structure, resulting in a segfault. This made -M and -C useless. Another problem the earlier implementation had was that it reordered the patches, and forced the logic to differentiate renames and copies to depend on that particular order. This problem was fixed by teaching rename/copy detection logic not to do any reordering, and rename-copy differentiator not to depend on the order of the patches. The diffs will leave rename/copy detector in the same destination path order as the patch that was fed into it. Some test vectors have been reordered to accommodate this change. It also adds a sanity check logic to the human-readable diff-raw output to detect paths with embedded TAB and LF characters, which cannot be expressed with that format. This idea came up during a discussion with Chris Wedgwood. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-24 01:26:26 -07:00
Junio C Hamano	bceafe752c	[PATCH] Fix diff-pruning logic which was running prune too early. For later stages to reorder patches, pruning logic and rename detection logic should not decide which delete to discard (because another entry said it will take over the file as a rename) until the very end. Also fix some tests that were assuming the earlier "last one is rename or keep everything else is copy" semantics of diff-raw format, which no longer is true. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-23 19:17:06 -07:00
Junio C Hamano	f7c1512af8	[PATCH] Rename/copy detection fix. The rename/copy detection logic in earlier round was only good enough to show patch output and discussion on the mailing list about the diff-raw format updates revealed many problems with it. This patch fixes all the ones known to me, without making things I want to do later impossible, mostly related to patch reordering. (1) Earlier rename/copy detector determined which one is rename and which one is copy too early, which made it impossible to later introduce diffcore transformers to reorder patches. This patch fixes it by moving that logic to the very end of the processing. (2) Earlier output routine diff_flush() was pruning all the "no-change" entries indiscriminatingly. This was done due to my false assumption that one of the requirements in the diff-raw output was not to show such an entry (which resulted in my incorrect comment about "diff-helper never being able to be equivalent to built-in diff driver"). My special thanks go to Linus for correcting me about this. When we produce diff-raw output, for the downstream to be able to tell renames from copies, sometimes it _is_ necessary to output "no-change" entries, and this patch adds diffcore_prune() function for doing it. (3) Earlier diff_filepair structure was trying to be not too specific about rename/copy operations, but the purpose of the structure was to record one or two paths, which _was_ indeed about rename/copy. This patch discards xfrm_msg field which was trying to be generic for this wrong reason, and introduces a couple of fields (rename_score and rename_rank) that are explicitly specific to rename/copy logic. One thing to note is that the information in a single diff_filepair structure _still_ does not distinguish renames from copies, and it is deliberately so. This is to allow patches to be reordered in later stages. (4) This patch also adds some tests about diff-raw format output and makes sure that necessary "no-change" entries appear on the output. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-23 11:49:30 -07:00
Junio C Hamano	60896c7bfe	[PATCH] Be careful with symlinks when detecting renames and copies. Earlier round was not treating symbolic links carefully enough, and would have produced diff output that renamed/copied then edited the contents of a symbolic link, which made no practical sense. Change it to detect only pure renames. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-23 11:49:30 -07:00
Junio C Hamano	6b14d7faf0	[PATCH] Diffcore updates. This moves the path selection logic from individual programs to a new diffcore transformer (diff-tree still needs to have its own for performance reasons). Also the header printing code in diff-tree was tweaked not to produce anything when pickaxe is in effect and there is nothing interesting to report. An interesting example is the following in the GIT archive itself: $ git-whatchanged -p -C -S'or something in a real script' Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-22 10:17:50 -07:00
Junio C Hamano	26dee0adfc	[PATCH] Add the code to set default minimum score back in. When the minimum score is specified as 0 (meaning "use default value"), set it to the default as we are told. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-22 09:46:06 -07:00
Junio C Hamano	cd1870edb6	[PATCH] Fix tweak in similarity estimator. There was a screwy math bug in the estimator that confused what -C1 meant and what -C9 meant, only in one of the early "cheap" check, which resulted in quite confusing behaviour. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-22 09:38:26 -07:00
Junio C Hamano	81e50eabf0	[PATCH] The diff-raw format updates. Update the diff-raw format as Linus and I discussed, except that it does not use sequence of underscore '_' letters to express nonexistence. All '0' mode is used for that purpose instead. The new diff-raw format can express rename/copy, and the earlier restriction that -M and -C _must_ be used with the patch format output is no longer necessary. The patch makes -M and -C flags independent of -p flag, so you need to say git-whatchanged -M -p to get the diff/patch format. Updated are both documentations and tests. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-21 22:49:19 -07:00
Junio C Hamano	38c6f78059	[PATCH] Prepare diffcore interface for diff-tree header supression. This does not actually supress the extra headers when pickaxe is used, but prepares enough support for diff-tree to implement it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-21 22:49:19 -07:00
Junio C Hamano	58b103f55d	[PATCH] Tweak diffcore-rename heuristics. The heuristics so far was to compare file size change and xdelta size against the average of file size before and after the change. This patch uses the smaller of pre- and post- change file size instead. It also makes a very small performance fix. I didn't measure it; I do not expect it to make any practical difference, but while scanning an already sorted list, breaking out in the middle is the right thing. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-21 16:22:57 -07:00
Junio C Hamano	52e9578985	[PATCH] Introducing software archaeologist's tool "pickaxe". This steals the "pickaxe" feature from JIT and make it available to the bare Plumbing layer. From the command line, the user gives a string he is intersted in. Using the diff-core infrastructure previously introduced, it filters the differences to limit the output only to the diffs between <src> and <dst> where the string appears only in one but not in the other. For example: $ ./git-rev-list HEAD \| ./git-diff-tree -Sdiff-tree-helper --stdin -M would show the diffs that touch the string "diff-tree-helper". In real software-archaeologist application, you would typically look for a few to several lines of code and see where that code came from. The "pickaxe" module runs after "rename/copy detection" module, so it even crosses the file rename boundary, as the above example demonstrates. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-21 09:58:03 -07:00
Junio C Hamano	427dcb4bca	[PATCH] Diff overhaul, adding half of copy detection. This introduces the diff-core, the layer between the diff-tree family and the external diff interface engine. The calls to the interface diff-tree family uses (diff_change and diff_addremove) have not changed and will not change. The purpose of the diff-core layer is to provide an infrastructure to transform the set of differences sent from the applications, before sending them to the external diff interface. The recently introduced rename detection code has been rewritten to use the diff-core facility. When applications send in separate creates and deletes, matching ones are transformed into a single rename-and-edit diff, and sent out to the external diff interface as such. This patch also enhances the rename detection code further to be able to detect copies. Currently this happens only as long as copy sources appear as part of the modified files, but there already is enough provision for callers to report unmodified files to diff-core, so that they can be also used as copy source candidates. Extending the callers this way will be done in a separate patch. Please see and marvel at how well this works by trying out the newly added t/t4003-diff-rename-1.sh test script. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-21 09:58:03 -07:00

1 2 3