mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-15 13:43:45 +01:00

Author	SHA1	Message	Date
Kirill Smelkov	72441af7c4	tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Previously diff_tree(), which is now named ll_diff_tree_sha1(), was generating diff_filepair(s) for two trees t1 and t2, and that was usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes a commit introduces. In Git, however, we have fundamentally built flexibility in that a commit can have many parents - 1 for a plain commit, 2 for a simple merge, but also more than 2 for merging several heads at once. For merges there is a so called combine-diff, which shows diff, a merge introduces by itself, omitting changes done by any parent. That works through first finding paths, that are different to all parents, and then showing generalized diff, with separate columns for +/- for each parent. The code lives in combine-diff.c . There is an impedance mismatch, however, in that a commit could generally have any number of parents, and that while diffing trees, we divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there is no special casing for multiple parents commits in e.g. revision-walker . That impedance mismatch hurts performance badly for generating combined diffs - in "combine-diff: optimize combine_diff_path sets intersection" I've already removed some slowness from it, but from the timings provided there, it could be seen, that combined diffs still cost more than an order of magnitude more cpu time, compared to diff for usual commits, and that would only be an optimistic estimate, if we take into account that for e.g. linux.git there is only one merge for several dozens of plain commits. That slowness comes from the fact that currently, while generating combined diff, a lot of time is spent computing diff(commit,commit^2) just to only then intersect that huge diff to almost small set of files from diff(commit,commit^1). That's because at present, to compute combine-diff, for first finding paths, that "every parent touches", we use the following combine-diff property/definition: D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths) where D(A,P1...Pn) is combined diff between commit A, and parents Pi and D(A,Pi) is usual two-tree diff Pi..A So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n 1-parent diffs and intersecting results will be slow. And usually, for linux.git and other topic-based workflows, that D(A,P2) is huge, because, if merge-base of A and P2, is several dozens of merges (from A, via first parent) below, that D(A,P2) will be diffing sum of merges from several subsystems to 1 subsystem. The solution is to avoid computing n 1-parent diffs, and to find changed-to-all-parents paths via scanning A's and all Pi's trees simultaneously, at each step comparing their entries, and based on that comparison, populate paths result, and deduce we could skip recursing into subdirectories, if at least for 1 parent, sha1 of that dir tree is the same as in A. That would save us from doing significant amount of needless work. Such approach is very similar to what diff_tree() does, only there we deal with scanning only 2 trees simultaneously, and for n+1 tree, the logic is a bit more complex: D(T,P1...Pn) calculation scheme ------------------------------- D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set) D(T,Pj) - diff between T..Pj D(T,P1...Pn) - combined diff from T to parents P1,...,Pn We start from all trees, which are sorted, and compare their entries in lock-step: T P1 Pn - - - \|t\| \|p1\| \|pn\| \|-\| \|--\| ... \|--\| imin = argmin(p1...pn) \| \| \| \| \| \| \|-\| \|--\| \|--\| \|.\| \|. \| \|. \| . . . . . . at any time there could be 3 cases: 1) t < p[imin]; 2) t > p[imin]; 3) t = p[imin]. Schematic deduction of what every case means, and what to do, follows: 1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓ 2) t > p[imin] 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓ 2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓ 3) t = p[imin] 3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate 3.2) pi = p[imin] -> investigate δ(t,pi) \| \| v 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø -> ⎧δ(t,pi) - if pi=p[imin] -> D += ⎨ ⎩"+t" - if pi>p[imin] in any case t↓ ∀ pi=p[imin] pi↓ ~ For comparison, here is how diff_tree() works: D(A,B) calculation scheme ------------------------- A B - - \|a\| \|b\| a < b -> a ∉ B -> D(A,B) += +a a↓ \|-\| \|-\| a > b -> b ∉ A -> D(A,B) += -b b↓ \| \| \| \| a = b -> investigate δ(a,b) a↓ b↓ \|-\| \|-\| \|.\| \|.\| . . . . ~~~~~~~~ This patch generalizes diff tree-walker to work with arbitrary number of parents as described above - i.e. now there is a resulting tree t, and some parents trees tp[i] i=[0..nparent). The generalization builds on the fact that usual diff D(A,B) is by definition the same as combined diff D(A,[B]), so if we could rework the code for common case and make it be not slower for nparent=1 case, usual diff(t1,t2) generation will not be slower, and multiparent diff tree-walker would greatly benefit generating combine-diff. What we do is as follows: 1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be a paths generator (new name diff_tree_paths()), with each generated path being `struct combine_diff_path` with info for path, new sha1,mode and for every parent which sha1,mode it was in it. 2) From that info, we can still generate usual diff queue with struct diff_filepairs, via "exporting" generated combine_diff_path, if we know we run for nparent=1 case. (see emit_diff() which is now named emit_diff_first_parent_only()) 3) In order for diff_can_quit_early(), which checks DIFF_OPT_TST(opt, HAS_CHANGES)) to work, that exporting have to be happening not in bulk, but incrementally, one diff path at a time. For such consumers, there is a new callback in diff_options introduced: ->pathchange(opt, struct combine_diff_path ) which, if set to !NULL, is called for every generated path. (see new compat ll_diff_tree_sha1() wrapper around new paths generator for setup) 4) The paths generation itself, is reworked from previous ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation scheme" provided above: On the start we allocate [nparent] arrays in place what was earlier just for one parent tree. then we just generalize loops, and comparison according to the algorithm. Some notes(): 1) alloca(), for small arrays, is used for "runs not slower for nparent=1 case than before" goal - if we change it to xmalloc()/free() the timings get ~1% worse. For alloca() we use just-introduced xalloca/xalloca_free compatibility wrappers, so it should not be a portability problem. 2) For every parent tree, we need to keep a tag, whether entry from that parent equals to entry from minimal parent. For performance reasons I'm keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ. Not doing so, we'd need to alloca another [nparent] array, which hurts performance. 3) For emitted paths, memory could be reused, if we know the path was processed via callback and will not be needed later. We use efficient hand-made realloc-style path_appendnew(), that saves us from ~1-1.5% of potential additional slowdown. 4) goto(s) are used in several places, as the code executes a little bit faster with lowered register pressure. Also - we should now check for FIND_COPIES_HARDER not only when two entries names are the same, and their hashes are equal, but also for a case, when a path was removed from some of all parents having it. The reason is, if we don't, that path won't be emitted at all (see "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants all paths - with diff or without - to be emitted, to be later analyzed for being copies sources. The new check is only necessary for nparent >1, as for nparent=1 case xmin_eqtotal always =1 =nparent, and a path is always added to diff as removal. ~~~~~~~~ Timings for # without -c, i.e. testing only nparent=1 case `git log --raw --no-abbrev --no-renames` before and after the patch are as follows: navy.git linux.git v3.10..v3.11 before 0.611s 1.889s after 0.619s 1.907s slowdown 1.3% 0.9% This timings show we did no harm to usual diff(tree1,tree2) generation. From the table we can see that we actually did ~1% slowdown, but I think I've "earned" that 1% in the previous patch ("tree-diff: reuse base str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case, net timings stays approximately the same. The output also stayed the same. (*) If we revert 1)-4) to more usual techniques, for nparent=1 case, we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as "do no harm for nparent=1 case" rule. For linux.git, combined diff will run an order of magnitude faster and appropriate timings will be provided in the next commit, as we'll be taking advantage of the new diff tree-walker for combined-diff generation there. P.S. and combined diff is not some exotic/for-play-only stuff - for example for a program I write to represent Git archives as readonly filesystem, there is initial scan with `git log --reverse --raw --no-abbrev --no-renames -c` to extract log of what was created/changed when, as a result building a map {} sha1 -> in which commit (and date) a content was added that `-c` means also show combined diff for merges, and without them, if a merge is non-trivial (merges changes from two parents with both having separate changes to a file), or an evil one, the map will not be full, i.e. some valid sha1 would be absent from it. That case was my initial motivation for combined diffs speedup. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-04-07 14:40:46 -07:00
Kirill Smelkov	61f76a3612	Portable alloca for Git In the next patch we'll have to use alloca() for performance reasons, but since alloca is non-standardized and is not portable, let's have a trick with compatibility wrappers: 1. at configure time, determine, do we have working alloca() through alloca.h, and define #define HAVE_ALLOCA_H if yes. 2. in code #ifdef HAVE_ALLOCA_H # include <alloca.h> # define xalloca(size) (alloca(size)) # define xalloca_free(p) do {} while(0) #else # define xalloca(size) (xmalloc(size)) # define xalloca_free(p) (free(p)) #endif and use it like func() { p = xalloca(size); ... xalloca_free(p); } This way, for systems, where alloca is available, we'll have optimal on-stack allocations with fast executions. On the other hand, on systems, where alloca is not available, this gracefully fallbacks to xmalloc/free. Both autoconf and config.mak.uname configurations were updated. For autoconf, we are not bothering considering cases, when no alloca.h is available, but alloca() works some other way - its simply alloca.h is available and works or not, everything else is deep legacy. For config.mak.uname, I've tried to make my almost-sure guess for where alloca() is available, but since I only have access to Linux it is the only change I can be sure about myself, with relevant to other changed systems people Cc'ed. NOTE SunOS and Windows had explicit -DHAVE_ALLOCA_H in their configurations. I've changed that to now-common HAVE_ALLOCA_H=YesPlease which should be correct. Cc: Brandon Casey <drafnel@gmail.com> Cc: Marius Storm-Olsen <mstormo@gmail.com> Cc: Johannes Sixt <j6t@kdbg.org> Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de> Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Cc: Gerrit Pape <pape@smarden.org> Cc: Petr Salinger <Petr.Salinger@seznam.cz> Cc: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Thomas Schwinge <thomas@codesourcery.com> (GNU Hurd changes) Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-27 11:54:01 -07:00
Kirill Smelkov	12cd81743d	tree-diff: reuse base str(buf) memory on sub-tree recursion Instead of allocating it all the time for every subtree in ll_diff_tree_sha1, let's allocate it once in diff_tree_sha1, and then all callee just use it in stacking style, without memory allocations. This should be faster, and for me this change gives the following slight speedups for git log --raw --no-abbrev --no-renames --format='%H' navy.git linux.git v3.10..v3.11 before 0.618s 1.903s after 0.611s 1.889s speedup 1.1% 0.7% Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-27 11:52:35 -07:00
Kirill Smelkov	b9081a6574	tree-diff: no need to call "full" diff_tree_sha1 from show_path() As described in previous commit, when recursing into sub-trees, we can use lower-level tree walker, since its interface is now sha1 based. The change is ok, because diff_tree_sha1() only invokes ll_diff_tree_sha1(), and also, if base is empty, try_to_follow_renames(). But base is not empty here, as we have added a path and '/' before recursing. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-27 11:50:29 -07:00
Kirill Smelkov	52894e7095	tree-diff: rework diff_tree interface to be sha1 based In the next commit this will allow to reduce intermediate calls, when recursing into subtrees - at that stage we know only subtree sha1, and it is natural for tree walker to start from that phase. For now we do diff_tree show_path diff_tree_sha1 diff_tree ... and the change will allow to reduce it to diff_tree show_path diff_tree Also, it will allow to omit allocating strbuf for each subtree, and just reuse the common strbuf via playing with its len. The above-mentioned improvements go in the next 2 patches. The downside is that try_to_follow_renames(), if active, we cause re-reading of 2 initial trees, which was negligible based on my timings, and which is outweighed cogently by the upsides. NOTE To keep with the current interface and semantics, I needed to rename the function from diff_tree() to diff_tree_sha1(). As diff_tree_sha1() was already used, and the function we are talking here is its more low-level helper, let's use convention for prefixing such helpers with "ll_". So the final renaming is diff_tree() -> ll_diff_tree_sha1() Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-27 11:49:35 -07:00
Kirill Smelkov	ad6f3cc7d2	tree-diff: diff_tree() should now be static We reworked all its users to use the functionality through diff_tree_sha1 variant in recent patches (see "tree-diff: allow diff_tree_sha1 to accept NULL sha1" and what comes next). diff_tree() is now not used outside tree-diff.c - make it static. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-26 14:30:47 -07:00
Kirill Smelkov	6ca844e9f5	tree-diff: remove special-case diff-emitting code for empty-tree cases While walking trees, we iterate their entries from lowest to highest in sort order, so empty tree means all entries were already went over. If we artificially assign +infinity value to such tree "entry", it will go after all usual entries, and through the usual driver loop we will be taking the same actions, which were hand-coded for special cases, i.e. t1 empty, t2 non-empty pathcmp(+∞, t2) -> +1 show_path(/t1=/NULL, t2); /* = t1 > t2 case in main loop / t1 non-empty, t2-empty pathcmp(t1, +∞) -> -1 show_path(t1, /t2=/NULL); / = t1 < t2 case in main loop */ In other words when we have t1 and t2, we return a sign that tells the caller to indicate the "earlier" one to be emitted, and by returning the sign that causes the non-empty side to be emitted, we will automatically cause the entries from the remaining side to be emitted, without attempting to touch the empty side at all. We can teach tree_entry_pathcmp() to pretend that an empty tree has an element that sorts after anything else to achieve this. Right now we never go to when compared tree descriptors are both infinity, as this condition is checked in the loop beginning as finishing criteria, but will do so in the future, when there will be several parents iterated simultaneously, and some pair of them would run to the end. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-26 14:25:11 -07:00
Kirill Smelkov	1a27a15452	tree-diff: simplify tree_entry_pathcmp Since an earlier "Finally switch over tree descriptors to contain a pre-parsed entry", we can safely access all tree_desc->entry fields directly instead of first "extracting" them through tree_entry_extract. Use it. The code generated stays the same - only it now visually looks cleaner. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:04:32 -07:00
Kirill Smelkov	5acabd84a6	tree-diff: show_path prototype is not needed anymore We moved all action-taking code below show_path() in recent HEAD~~ (tree-diff: move all action-taking code out of compare_tree_entry). Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:04:32 -07:00
Kirill Smelkov	9bc0619655	tree-diff: rename compare_tree_entry -> tree_entry_pathcmp Since previous commit, this function does not compare entry hashes, and mode are compared fully outside of it. So what it does is compare entry names and DIR bit in modes. Reflect this in its name. Add documentation stating the semantics, and move the note about files/dirs comparison to it. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:04:32 -07:00
Kirill Smelkov	903bba68ab	tree-diff: move all action-taking code out of compare_tree_entry() - let it do only comparison. This way the code is cleaner and more structured - cmp function only compares, and the driver takes action based on comparison result. There should be no change in performance, as effectively, we just move if series from on place into another, and merge it to was-already-there same switch/if, so the result is maybe a little bit faster. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:04:31 -07:00
Kirill Smelkov	5dfb2bbd8d	tree-diff: don't assume compare_tree_entry() returns -1,0,1 It does, but we'll be reworking it in the next patch after it won't, and besides it is better to stick to standard strcmp/memcmp/base_name_compare/etc... convention, where comparison function returns <0, =0, >0 Regarding performance, comparing for <0, =0, >0 should be a little bit faster, than switch, because it is just 1 test-without-immediate instruction and then up to 3 conditional branches, and in switch you have up to 3 tests with immediate and up to 3 conditional branches. No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it will be good after we'll be adding support for multiparent walker and will stay that way. =0 case goes first, because it happens more often in real diffs - i.e. paths are the same. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:04:31 -07:00
Kirill Smelkov	d00e980c22	tree-diff: consolidate code for emitting diffs and recursion in one place Currently both compare_tree_entry() and show_entry() invoke opt diff callbacks (opt->add_remove() and opt->change()), and also they both have code which decides whether to recurse into sub-tree, and whether to emit a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set. I.e. we have code duplication and logic scattered on two places. Let's consolidate it - all diff emiting code and recurion logic moves to show_entry, which is now named as show_path, because it shows diff for a path, based on up to two tree entries, with actual diff emitting code being kept in new helper emit_diff() for clarity. What we have as the result, is that compare_tree_entry is now free from code with logic for diff generation, and also performance is not affected as timings for `git log --raw --no-abbrev --no-renames` for navy.git and `linux.git v3.10..v3.11`, just like in previous patch, stay the same. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 15:03:24 -07:00
Kirill Smelkov	7e9003c149	tree-diff: show_tree() is not needed We don't need special code for showing added/removed subtree, because we can do the same via diff_tree_sha1, just passing NULL for absent tree. And compared to show_tree(), which was calling show_entry() for every tree entry, that would lead to the same show_entry() callings: show_tree(t): for e in t.entries: show_entry(e) diff_tree_sha1(NULL, new): /* the same applies to (old, NULL) */ diff_tree(t1=NULL, t2) ... if (!t1->size) show_entry(t2) ... and possible overhead is negligible, since after the patch, timing for `git log --raw --no-abbrev --no-renames` for navy.git and `linux.git v3.10..v3.11` is practically the same. So let's say goodbye to show_tree() - it removes some code, but also, and what is important, consolidates more code for showing/recursing into trees into one place. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-04 13:33:47 -08:00
Kirill Smelkov	e906612121	tree-diff: no need to pass match to skip_uninteresting() It is neither used there as input, nor the output written through it, is used outside. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:46:11 -08:00
Kirill Smelkov	e197c2b650	tree-diff: no need to manually verify that there is no mode change for a path Because if there is, such two tree entries would never be compared as equal - the code in base_name_compare() explicitly compares modes, if there is a change for dir bit, even for equal paths, entries would compare as different. The code I'm removing here is from 2005 April `262e82b4` (Fix diff-tree recursion), which pre-dates base_name_compare() introduction in `958ba6c9` (Introduce "base_name_compare()" helper function) by a month. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:46:11 -08:00
Kirill Smelkov	eeb3f32868	combine-diff: move changed-paths scanning logic into its own function Move code for finding paths for which diff(commit,parent_i) is not-empty for all parents to separate function - at present we have generic (and slow) code for this job, which translates 1 n-parent problem to n 1-parent problems and then intersect results, and will be adding another limited, but faster, paths scanning implementation in the next patch. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:46:11 -08:00
Kirill Smelkov	51af1886c7	combine-diff: move show_log_first logic/action out of paths scanning Judging from sample outputs and tests nothing changes in diff -c output, and this change will help later patches, when we'll be refactoring paths scanning into its own function with several variants - the show_log_first logic / code will stay common to all of them. NOTE: only now we have to take care to explicitly not show anything if parents array is empty, as in fact there are some clients in Git code, which calls diff_tree_combined() in such a way. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:46:11 -08:00
Kirill Smelkov	fce135c4ff	tests: add checking that combine-diff emits only correct paths where "correct paths" stands for paths that are different to all parents. Up until now, we were testing combined diff only on one file, or on several files which were all different (t4038-diff-combined.sh). As recent thinko in "simplify intersect_paths() further" showed, and also, since we are going to rework code for finding paths different to all parents, lets write at least basic tests. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Junio C Hamano	7b1004b0ba	combine-diff: simplify intersect_paths() further Linus once said: I actually wish more people understood the really core low-level kind of coding. Not big, complex stuff like the lockless name lookup, but simply good use of pointers-to-pointers etc. For example, I've seen too many people who delete a singly-linked list entry by keeping track of the "prev" entry, and then to delete the entry, doing something like if (prev) prev->next = entry->next; else list_head = entry->next; and whenever I see code like that, I just go "This person doesn't understand pointers". And it's sadly quite common. People who understand pointers just use a "pointer to the entry pointer", and initialize that with the address of the list_head. And then as they traverse the list, they can remove the entry without using any conditionals, by just doing a "*pp = entry->next". Applying that simplification lets us lose 7 lines from this function even while adding 2 lines of comment. I was tempted to squash this into the original commit, but because the benchmarking described in the commit log is without this simplification, I decided to keep it a separate follow-up patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Kirill Smelkov	af82c7880f	combine-diff: combine_diff_path.len is not needed anymore The field was used in order to speed-up name comparison and also to mark removed paths by setting it to 0. Because the updated code does significantly less strcmp and also just removes paths from the list and free right after we know a path will not be needed, it is not needed anymore. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Kirill Smelkov	8518ff8fab	combine-diff: optimize combine_diff_path sets intersection When generating combined diff, for each commit, we intersect diff paths from diff(parent_0,commit) to diff(parent_i,commit) comparing all paths pairs, i.e. doing it the quadratic way. That is correct, but could be optimized. Paths come from trees in sorted (= tree) order, and so does diff_tree() emits resulting paths in that order too. Now if we look at diffcore transformations, all of them, except diffcore_order, preserve resulting path ordering: - skip_stat_unmatch, grep, pickaxe, filter -- just skip elements -> order stays preserved - break -- just breaks diff for a path, adding path dup after the path -> order stays preserved - detect rename/copy -- resulting paths are emitted sorted (verified empirically) So only diffcore_order changes diff paths ordering. But diffcore_order meaning affects only presentation - i.e. only how to show the diff, so we could do all the internal computations without paths reordering, and order only resultant paths set. This is faster, since, if we know two paths sets are all ordered, their intersection could be done in linear time. This patch does just that. Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log") and with `-c` ("git log -c") before and after the patch are as follows: linux.git v3.10..v3.11 log log -c before 1.9s 20.4s after 1.9s 16.6s navy.git (private repo) log log -c before 0.83s 15.6s after 0.83s 2.1s P.S. I think linux.git case is sped up not so much as the second one, since in navy.git, there are more exotic (subtree, etc) merges. P.P.S. My tracing showed that the rest of the time (16.6s vs 1.9s) is usually spent in computing huge diffs from commit to second parent. Will try to deal with it, if I'll have time. P.P.P.S. For combine_diff_path, ->len is not needed anymore - will remove it in the next noisy cleanup path, to maintain good signal/noise ratio here. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Kirill Smelkov	91921ceff6	diff test: add tests for combine-diff with orderfile In the next patch combine-diff will have special code-path for taking orderfile into account. Prepare for making changes by introducing coverage tests for that case. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Kirill Smelkov	1df4320fa2	diffcore-order: export generic ordering interface diffcore_order() interface only accepts a queue of `struct diff_filepair`. In the next patches, we'll want to order `struct combine_diff_path` by path, so let's first rework diffcore-order to also provide generic low-level interface for ordering arbitrary objects, provided they have path accessors. The new interface is: - `struct obj_order` for describing objects to ordering routine, and - order_objects() for actually doing the ordering work. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Kirill Smelkov	7146e66f08	tree-walk: finally switch over tree descriptors to contain a pre-parsed entry This continues `4651ece8` (Switch over tree descriptors to contain a pre-parsed entry) and moves the only rest computational part mode = canon_mode(mode) from tree_entry_extract() to tree entry decode phase - to decode_tree_entry(). The reason to do it, is that canon_mode() is at least 2 conditional jumps for regular files, and that could be noticeable should canon_mode() be invoked several times. That does not matter for current Git codebase, where typical tree traversal is while (t->size) { sha1 = tree_entry_extract(t, &path, &mode); ... update_tree_entry(t); } i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such cases, it does not matter performance-wise, where that mode canonicalization is done - either once in tree_entry_extract(), or once in decode_tree_entry() called by update_tree_entry() - it is approximately the same. But for future code, which could need to work with several tree_desc's in parallel, it could be handy to operate on tree_desc descriptors, and do "extracts" only when needed, or at all, access only relevant part of it through structure fields directly. And for such situations, having canon_mode() be done once in decode phase is better - we won't need to pay the performance price of 2 extra conditional jumps on every t->mode access. So let's move mode canonicalization to decode_tree_entry(). That was the final bit. Now after tree entry is decoded, it is fully ready and could be accessed either directly via field, or through tree_entry_extract() which this time got really "totally trivial". Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:43:29 -08:00
Kirill Smelkov	6275c91c08	revision: convert to using diff_tree_sha1() Since diff_tree_sha1() can now accept empty trees via NULL sha1, we could just call it without manually reading trees into tree_desc and duplicating code. Besides, that if (!tree) return 0; looked suspect - we were saying an invalid tree != empty tree, but maybe it is better to just say the tree is invalid here, which is what diff_tree_sha1() does for such case. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-05 10:51:16 -08:00
Kirill Smelkov	7bc4ec01dd	line-log: convert to using diff_tree_sha1() Since diff_tree_sha1() can now accept empty trees via NULL sha1, we could just call it without manually reading trees into tree_desc and duplicating code. Cc: Thomas Rast <tr@thomasrast.ch> Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-05 10:50:36 -08:00
Kirill Smelkov	0b707c3319	tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL Now since diff_tree_sha1 understands NULL for both old and new, we could indicate an empty tree for root commit by providing just NULL for old sha1. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-05 10:49:07 -08:00
Kirill Smelkov	791303284c	tree-diff: allow diff_tree_sha1 to accept NULL sha1 which would mean that corresponding tree - old or new - is empty. As followup patches will show, that functionality was already needed in several places of Git codebase, but there, we were preparing empty tree_desc objects by hand, with some code duplication. For handling sha1 = NULL case, let's reuse fill_tree_descriptor() which returns just empty tree_desc in that case. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-05 10:48:14 -08:00
Junio C Hamano	be961c292f	Git 1.9-rc2 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-31 14:16:06 -08:00
Junio C Hamano	e94ea162db	Merge git://github.com/git-l10n/git-po * 'master' of git://github.com/git-l10n/git-po: l10n: Bulgarian translation of git (222t21f1967u) po/TEAMS: Added Bulgarian team l10n: remove 2 blank translations on Danish, Dutch l10n: zh_CN.po: translate 27 messages (2210t0f0u) l10n: Update Swedish translation (2210t0f0u) [fr] update french translation 2210/2210 l10n: vi.po (2210t): Updated git-core translation l10n: git.pot: v1.9 round 1 (27 new, 11 removed)	2014-01-31 10:52:29 -08:00
Junio C Hamano	3de92cd16d	Merge branch 'jn/pager-lv-default-env' A finishing touch to its test. * jn/pager-lv-default-env: pager test: make fake pager consume all its input	2014-01-31 10:51:57 -08:00
Jonathan Nieder	52c02f658e	pager test: make fake pager consume all its input Otherwise there is a race: if 'git log' finishes writing before the pager terminates and closes the pipe, all is well, and if the pager finishes quickly enough then 'git log' terminates with SIGPIPE. died of signal 13 at /build/buildd/git-1.9~rc1/t/test-terminal.perl line 33. not ok 6 - LESS and LV envvars are set for pagination Noticed on Ubuntu PPA builders, where the race was lost about half the time. Compare v1.7.0.2~6^2 (tests: Fix race condition in t7006-pager, 2010-02-22). Reported-by: Anders Kaseorg <andersk@MIT.EDU> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-31 09:07:17 -08:00
Alexander Shopov	25e2fbb4e2	l10n: Bulgarian translation of git (222t21f1967u) Signed-off-by: Alexander Shopov <ash@kambanaria.org>	2014-01-29 14:29:15 +02:00
Alexander Shopov	5123e7d54f	po/TEAMS: Added Bulgarian team Signed-off-by: Alexander Shopov <ash@kambanaria.org>	2014-01-28 19:16:53 +02:00
Junio C Hamano	bd3e186d81	Git 1.9-rc1 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-27 11:01:35 -08:00
Junio C Hamano	8bba7206b5	Merge branch 'as/tree-walk-fix-aggressive-short-cut' * as/tree-walk-fix-aggressive-short-cut: tree_entry_interesting: match against all pathspecs	2014-01-27 10:48:32 -08:00
Junio C Hamano	1ad5417a26	Merge branch 'ta/doc-http-protocol-in-html' * ta/doc-http-protocol-in-html: http-protocol.txt: don't use uppercase for variable names in "The Negotiation Algorithm" Documentation: make it easier to maintain enumerated documents create HTML for http-protocol.txt	2014-01-27 10:45:59 -08:00
Junio C Hamano	78dc48e4b0	Merge branch 'mh/doc-wo-names' * mh/doc-wo-names: doc: remote author/documentation sections from more pages	2014-01-27 10:45:56 -08:00
Junio C Hamano	69b024dc03	Merge branch 'jk/revision-o-is-in-libgit-a' * jk/revision-o-is-in-libgit-a: Makefile: remove redundant object in git-http{fetch,push}	2014-01-27 10:45:52 -08:00
Junio C Hamano	4110639865	Merge branch 'sb/repack-in-c' "git repack --max-pack-size=8g" stopped being parsed correctly when the command was reimplemented in C. * sb/repack-in-c: repack: propagate pack-objects options as strings repack: make parsed string options const-correct repack: fix typo in max-pack-size option	2014-01-27 10:45:49 -08:00
Junio C Hamano	cdc40bdb69	Merge branch 'jk/test-fixes' * jk/test-fixes: t7700: do not use "touch" unnecessarily t7501: fix "empty commit" test with NO_PERL	2014-01-27 10:45:46 -08:00
Junio C Hamano	017f804efc	Merge branch 'nd/negative-pathspec' * nd/negative-pathspec: tree-walk.c: ignore trailing slash on submodule in tree_entry_interesting()	2014-01-27 10:45:44 -08:00
Junio C Hamano	523f0a25b9	Merge branch 'pw/git-p4' Various "git p4" updates. * pw/git-p4: git p4 doc: use two-line style for options with multiple spellings git p4 test: examine behavior with locked (+l) files git p4: fix an error message when "p4 where" fails git p4: handle files with wildcards when doing RCS scrubbing git p4 test: do not pollute /tmp git p4 test: run as user "author" git p4 test: is_cli_file_writeable succeeds git p4 test: explicitly check p4 wildcard delete git p4: work around p4 bug that causes empty symlinks git p4 test: ensure p4 symlink parsing works git p4 test: wildcards are supported	2014-01-27 10:45:41 -08:00
Junio C Hamano	33d4669aaa	Merge branch 'ss/safe-create-leading-dir-with-slash' "git clone $origin foo\bar\baz" on Windows failed to create the leading directories (i.e. a moral-equivalent of "mkdir -p"). * ss/safe-create-leading-dir-with-slash: safe_create_leading_directories(): on Windows, \ can separate path components	2014-01-27 10:45:37 -08:00
Junio C Hamano	d0956cfa8e	Merge branch 'mh/safe-create-leading-directories' Code clean-up and protection against concurrent write access to the ref namespace. * mh/safe-create-leading-directories: rename_tmp_log(): on SCLD_VANISHED, retry rename_tmp_log(): limit the number of remote_empty_directories() attempts rename_tmp_log(): handle a possible mkdir/rmdir race rename_ref(): extract function rename_tmp_log() remove_dir_recurse(): handle disappearing files and directories remove_dir_recurse(): tighten condition for removing unreadable dir lock_ref_sha1_basic(): if locking fails with ENOENT, retry lock_ref_sha1_basic(): on SCLD_VANISHED, retry safe_create_leading_directories(): add new error value SCLD_VANISHED cmd_init_db(): when creating directories, handle errors conservatively safe_create_leading_directories(): introduce enum for return values safe_create_leading_directories(): always restore slash at end of loop safe_create_leading_directories(): split on first of multiple slashes safe_create_leading_directories(): rename local variable safe_create_leading_directories(): add explicit "slash" pointer safe_create_leading_directories(): reduce scope of local variable safe_create_leading_directories(): fix format of "if" chaining	2014-01-27 10:45:33 -08:00
Junio C Hamano	c380cf85a7	Merge branch 'tr/nth-previous-is-a-commit' * tr/nth-previous-is-a-commit: Documentation: @{-N} can refer to a commit	2014-01-27 10:45:31 -08:00
Junio C Hamano	bf3939901b	Merge branch 'tr/gitk-doc-range-trace' * tr/gitk-doc-range-trace: Documentation/gitk: document -L option	2014-01-27 10:45:23 -08:00
Junio C Hamano	a6bec00145	Merge branch 'jk/mark-edges-uninteresting' Fix performance regression in v1.8.4.x and later. * jk/mark-edges-uninteresting: list-objects: only look at cmdline trees with edge_hint t/perf: time rev-list with UNINTERESTING commits	2014-01-27 10:45:08 -08:00
Junio C Hamano	e049109ef1	Merge branch 'jk/diff-filespec-cleanup' * jk/diff-filespec-cleanup: diff_filespec: use only 2 bits for is_binary flag diff_filespec: reorder is_binary field diff_filespec: drop xfrm_flags field diff_filespec: drop funcname_pattern_ident field diff_filespec: reorder dirty_submodule macro definitions	2014-01-27 10:45:03 -08:00

1 2 3 4 5 ...