mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 06:54:55 +01:00

586 lines

15 KiB

C

Raw Normal View History

Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`/*`
			`* LibXDiff by Davide Libenzi ( File Differential Library )`
			`* Copyright (C) 2003 Davide Libenzi`
			`*`
			`* This library is free software; you can redistribute it and/or`
			`* modify it under the terms of the GNU Lesser General Public`
			`* License as published by the Free Software Foundation; either`
			`* version 2.1 of the License, or (at your option) any later version.`
			`*`
			`* This library is distributed in the hope that it will be useful,`
			`* but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU`
			`* Lesser General Public License for more details.`
			`*`
			`* You should have received a copy of the GNU Lesser General Public`
			`* License along with this library; if not, write to the Free Software`
			`* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA`
			`*`
			`* Davide Libenzi <davidel@xmailserver.org>`
			`*`
			`*/`

			`#include "xinclude.h"`



			`#define XDL_MAX_COST_MIN 256`
			`#define XDL_HEUR_MIN_COST 256`
refactor: use bitsizeof() instead of 8 * sizeof() Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-07-22 23:34:34 +02:00			`#define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`#define XDL_SNAKE_CNT 20`
			`#define XDL_K_HEUR 4`



			`typedef struct s_xdpsplit {`
			`long i1, i2;`
			`int min_lo, min_hi;`
			`} xdpsplit_t;`




			`static long xdl_split(unsigned long const *ha1, long off1, long lim1,`
			`unsigned long const *ha2, long off2, long lim2,`
			`long kvdf, long kvdb, int need_min, xdpsplit_t *spl,`
			`xdalgoenv_t *xenv);`
			`static xdchange_t xdl_add_change(xdchange_t xscr, long i1, long i2, long chg1, long chg2);`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00



			`/*`
			`* See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.`
			`* Basically considers a "box" (off1, off2, lim1, lim2) and scan from both`
			`* the forward diagonal starting from (off1, off2) and the backward diagonal`
			`* starting from (lim1, lim2). If the K values on the same diagonal crosses`
			`* returns the furthest point of reach. We might end up having to expensive`
			`* cases using this algorithm is full, so a little bit of heuristic is needed`
			`* to cut the search and to return a suboptimal point.`
			`*/`
			`static long xdl_split(unsigned long const *ha1, long off1, long lim1,`
			`unsigned long const *ha2, long off2, long lim2,`
			`long kvdf, long kvdb, int need_min, xdpsplit_t *spl,`
			`xdalgoenv_t *xenv) {`
			`long dmin = off1 - lim2, dmax = lim1 - off2;`
			`long fmid = off1 - off2, bmid = lim1 - lim2;`
			`long odd = (fmid - bmid) & 1;`
			`long fmin = fmid, fmax = fmid;`
			`long bmin = bmid, bmax = bmid;`
			`long ec, d, i1, i2, prev1, best, dd, v, k;`

			`/*`
			`* Set initial diagonal values for both forward and backward path.`
			`*/`
			`kvdf[fmid] = off1;`
			`kvdb[bmid] = lim1;`

			`for (ec = 1;; ec++) {`
			`int got_snake = 0;`

			`/*`
			`* We need to extent the diagonal "domain" by one. If the next`
			`* values exits the box boundaries we need to change it in the`
			`* opposite direction because (max - min) must be a power of two.`
Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-10 07:50:18 +02:00			`* Also we initialize the external K value to -1 so that we can`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`* avoid extra conditions check inside the core loop.`
			`*/`
			`if (fmin > dmin)`
			`kvdf[--fmin - 1] = -1;`
			`else`
			`++fmin;`
			`if (fmax < dmax)`
			`kvdf[++fmax + 1] = -1;`
			`else`
			`--fmax;`

			`for (d = fmax; d >= fmin; d -= 2) {`
			`if (kvdf[d - 1] >= kvdf[d + 1])`
			`i1 = kvdf[d - 1] + 1;`
			`else`
			`i1 = kvdf[d + 1];`
			`prev1 = i1;`
			`i2 = i1 - d;`
			`for (; i1 < lim1 && i2 < lim2 && ha1[i1] == ha2[i2]; i1++, i2++);`
			`if (i1 - prev1 > xenv->snake_cnt)`
			`got_snake = 1;`
			`kvdf[d] = i1;`
			`if (odd && bmin <= d && d <= bmax && kvdb[d] <= i1) {`
			`spl->i1 = i1;`
			`spl->i2 = i2;`
			`spl->min_lo = spl->min_hi = 1;`
			`return ec;`
			`}`
			`}`

			`/*`
			`* We need to extent the diagonal "domain" by one. If the next`
			`* values exits the box boundaries we need to change it in the`
			`* opposite direction because (max - min) must be a power of two.`
Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-10 07:50:18 +02:00			`* Also we initialize the external K value to -1 so that we can`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`* avoid extra conditions check inside the core loop.`
			`*/`
			`if (bmin > dmin)`
			`kvdb[--bmin - 1] = XDL_LINE_MAX;`
			`else`
			`++bmin;`
			`if (bmax < dmax)`
			`kvdb[++bmax + 1] = XDL_LINE_MAX;`
			`else`
			`--bmax;`

			`for (d = bmax; d >= bmin; d -= 2) {`
			`if (kvdb[d - 1] < kvdb[d + 1])`
			`i1 = kvdb[d - 1];`
			`else`
			`i1 = kvdb[d + 1] - 1;`
			`prev1 = i1;`
			`i2 = i1 - d;`
			`for (; i1 > off1 && i2 > off2 && ha1[i1 - 1] == ha2[i2 - 1]; i1--, i2--);`
			`if (prev1 - i1 > xenv->snake_cnt)`
			`got_snake = 1;`
			`kvdb[d] = i1;`
			`if (!odd && fmin <= d && d <= fmax && i1 <= kvdf[d]) {`
			`spl->i1 = i1;`
			`spl->i2 = i2;`
			`spl->min_lo = spl->min_hi = 1;`
			`return ec;`
			`}`
			`}`

			`if (need_min)`
			`continue;`

			`/*`
			`* If the edit cost is above the heuristic trigger and if`
			`* we got a good snake, we sample current diagonals to see`
			`* if some of the, have reached an "interesting" path. Our`
			`* measure is a function of the distance from the diagonal`
			`* corner (i1 + i2) penalized with the distance from the`
			`* mid diagonal itself. If this value is above the current`
			`* edit cost times a magic factor (XDL_K_HEUR) we consider`
			`* it interesting.`
			`*/`
			`if (got_snake && ec > xenv->heur_min) {`
			`for (best = 0, d = fmax; d >= fmin; d -= 2) {`
			`dd = d > fmid ? d - fmid: fmid - d;`
			`i1 = kvdf[d];`
			`i2 = i1 - d;`
			`v = (i1 - off1) + (i2 - off2) - dd;`

			`if (v > XDL_K_HEUR * ec && v > best &&`
			`off1 + xenv->snake_cnt <= i1 && i1 < lim1 &&`
			`off2 + xenv->snake_cnt <= i2 && i2 < lim2) {`
			`for (k = 1; ha1[i1 - k] == ha2[i2 - k]; k++)`
			`if (k == xenv->snake_cnt) {`
			`best = v;`
			`spl->i1 = i1;`
			`spl->i2 = i2;`
			`break;`
			`}`
			`}`
			`}`
			`if (best > 0) {`
			`spl->min_lo = 1;`
			`spl->min_hi = 0;`
			`return ec;`
			`}`

			`for (best = 0, d = bmax; d >= bmin; d -= 2) {`
			`dd = d > bmid ? d - bmid: bmid - d;`
			`i1 = kvdb[d];`
			`i2 = i1 - d;`
			`v = (lim1 - i1) + (lim2 - i2) - dd;`

			`if (v > XDL_K_HEUR * ec && v > best &&`
			`off1 < i1 && i1 <= lim1 - xenv->snake_cnt &&`
			`off2 < i2 && i2 <= lim2 - xenv->snake_cnt) {`
			`for (k = 0; ha1[i1 + k] == ha2[i2 + k]; k++)`
			`if (k == xenv->snake_cnt - 1) {`
			`best = v;`
			`spl->i1 = i1;`
			`spl->i2 = i2;`
			`break;`
			`}`
			`}`
			`}`
			`if (best > 0) {`
			`spl->min_lo = 0;`
			`spl->min_hi = 1;`
			`return ec;`
			`}`
			`}`

			`/*`
			`* Enough is enough. We spent too much time here and now we collect`
			`* the furthest reaching path using the (i1 + i2) measure.`
			`*/`
			`if (ec >= xenv->mxcost) {`
			`long fbest, fbest1, bbest, bbest1;`

xdiff/xdiffi.c: fix warnings about possibly uninitialized variables Compiling this module gave the following warnings (some double dutch!): xdiff/xdiffi.c: In functie 'xdl_recs_cmp': xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function A superficial tracking of their usage, without deeper knowledge about the algorithm, indeed confirms that there are code paths on which these variables will be used uninitialized. In practice these code paths might never be reached, but then these fixes will not change the algorithm. If these code paths are ever reached we now at least have a predictable outcome. And should the very small performance impact of these initializations be noticeable, then they should at least be replaced by comments why certain code paths will never be reached. Some extra initializations in this patch now fix the warnings. 2006-04-08 17:27:20 +02:00			`fbest = fbest1 = -1;`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`for (d = fmax; d >= fmin; d -= 2) {`
			`i1 = XDL_MIN(kvdf[d], lim1);`
			`i2 = i1 - d;`
			`if (lim2 < i2)`
			`i1 = lim2 + d, i2 = lim2;`
			`if (fbest < i1 + i2) {`
			`fbest = i1 + i2;`
			`fbest1 = i1;`
			`}`
			`}`

xdiff/xdiffi.c: fix warnings about possibly uninitialized variables Compiling this module gave the following warnings (some double dutch!): xdiff/xdiffi.c: In functie 'xdl_recs_cmp': xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function A superficial tracking of their usage, without deeper knowledge about the algorithm, indeed confirms that there are code paths on which these variables will be used uninitialized. In practice these code paths might never be reached, but then these fixes will not change the algorithm. If these code paths are ever reached we now at least have a predictable outcome. And should the very small performance impact of these initializations be noticeable, then they should at least be replaced by comments why certain code paths will never be reached. Some extra initializations in this patch now fix the warnings. 2006-04-08 17:27:20 +02:00			`bbest = bbest1 = XDL_LINE_MAX;`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`for (d = bmax; d >= bmin; d -= 2) {`
			`i1 = XDL_MAX(off1, kvdb[d]);`
			`i2 = i1 - d;`
			`if (i2 < off2)`
			`i1 = off2 + d, i2 = off2;`
			`if (i1 + i2 < bbest) {`
			`bbest = i1 + i2;`
			`bbest1 = i1;`
			`}`
			`}`

			`if ((lim1 + lim2) - bbest < fbest - (off1 + off2)) {`
			`spl->i1 = fbest1;`
			`spl->i2 = fbest - fbest1;`
			`spl->min_lo = 1;`
			`spl->min_hi = 0;`
			`} else {`
			`spl->i1 = bbest1;`
			`spl->i2 = bbest - bbest1;`
			`spl->min_lo = 0;`
			`spl->min_hi = 1;`
			`}`
			`return ec;`
			`}`
			`}`
			`}`


			`/*`
			`* Rule: "Divide et Impera". Recursively split the box in sub-boxes by calling`
			`* the box splitting function. Note that the real job (marking changed lines)`
			`* is done in the two boundary reaching checks.`
			`*/`
			`int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,`
			`diffdata_t *dd2, long off2, long lim2,`
			`long kvdf, long kvdb, int need_min, xdalgoenv_t *xenv) {`
			`unsigned long const ha1 = dd1->ha, ha2 = dd2->ha;`

			`/*`
			`* Shrink the box by walking through each diagonal snake (SW and NE).`
			`*/`
			`for (; off1 < lim1 && off2 < lim2 && ha1[off1] == ha2[off2]; off1++, off2++);`
			`for (; off1 < lim1 && off2 < lim2 && ha1[lim1 - 1] == ha2[lim2 - 1]; lim1--, lim2--);`

			`/*`
			`* If one dimension is empty, then all records on the other one must`
			`* be obviously changed.`
			`*/`
			`if (off1 == lim1) {`
			`char *rchg2 = dd2->rchg;`
			`long *rindex2 = dd2->rindex;`

			`for (; off2 < lim2; off2++)`
			`rchg2[rindex2[off2]] = 1;`
			`} else if (off2 == lim2) {`
			`char *rchg1 = dd1->rchg;`
			`long *rindex1 = dd1->rindex;`

			`for (; off1 < lim1; off1++)`
			`rchg1[rindex1[off1]] = 1;`
			`} else {`
			`xdpsplit_t spl;`
xdiff/xdiffi.c: fix warnings about possibly uninitialized variables Compiling this module gave the following warnings (some double dutch!): xdiff/xdiffi.c: In functie 'xdl_recs_cmp': xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function A superficial tracking of their usage, without deeper knowledge about the algorithm, indeed confirms that there are code paths on which these variables will be used uninitialized. In practice these code paths might never be reached, but then these fixes will not change the algorithm. If these code paths are ever reached we now at least have a predictable outcome. And should the very small performance impact of these initializations be noticeable, then they should at least be replaced by comments why certain code paths will never be reached. Some extra initializations in this patch now fix the warnings. 2006-04-08 17:27:20 +02:00			`spl.i1 = spl.i2 = 0;`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00
			`/*`
			`* Divide ...`
			`*/`
Fix various dead stores found by the clang static analyzer http-push.c::finish_request(): request is initialized by the for loop index-pack.c::free_base_data(): b is initialized by the for loop merge-recursive.c::process_renames(): move compare to narrower scope, and remove unused assignments to it remove unused variable renames2 xdiff/xdiffi.c::xdl_recs_cmp(): remove unused variable ec xdiff/xemit.c::xdl_emit_diff(): xche is always overwritten Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-03-15 22:01:20 +01:00			`if (xdl_split(ha1, off1, lim1, ha2, off2, lim2, kvdf, kvdb,`
			`need_min, &spl, xenv) < 0) {`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00
			`return -1;`
			`}`

			`/*`
			`* ... et Impera.`
			`*/`
			`if (xdl_recs_cmp(dd1, off1, spl.i1, dd2, off2, spl.i2,`
			`kvdf, kvdb, spl.min_lo, xenv) < 0 \|\|`
			`xdl_recs_cmp(dd1, spl.i1, lim1, dd2, spl.i2, lim2,`
			`kvdf, kvdb, spl.min_hi, xenv) < 0) {`

			`return -1;`
			`}`
			`}`

			`return 0;`
			`}`


			`int xdl_do_diff(mmfile_t mf1, mmfile_t mf2, xpparam_t const *xpp,`
			`xdfenv_t *xe) {`
			`long ndiags;`
			`long kvd, kvdf, *kvdb;`
			`xdalgoenv_t xenv;`
			`diffdata_t dd1, dd2;`

xdiff: PATIENCE/HISTOGRAM are not independent option bits Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-20 00:36:55 +01:00			`if (XDF_DIFF_ALG(xpp->flags) == XDF_PATIENCE_DIFF)`
Implement the patience diff algorithm The patience diff algorithm produces slightly more intuitive output than the classic Myers algorithm, as it does not try to minimize the number of +/- lines first, but tries to preserve the lines that are unique. To this end, it first determines lines that are unique in both files, then the maximal sequence which preserves the order (relative to both files) is extracted. Starting from this initial set of common lines, the rest of the lines is handled recursively, with Myers' algorithm as a fallback when the patience algorithm fails (due to no common unique lines). This patch includes memory leak fixes by Pierre Habouzit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-07 18:04:09 +01:00			`return xdl_do_patience_diff(mf1, mf2, xpp, xe);`

xdiff: PATIENCE/HISTOGRAM are not independent option bits Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-20 00:36:55 +01:00			`if (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF)`
teach --histogram to diff Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm. The implementation has been reworked to use structs and pointers, instead of bitmasks, thus doing away with JGit's 2^28 line limit. We also use xdiff's default hash table implementation (xdl_hash_bits() with XDL_HASHLONG()) for convenience. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-07-12 08:10:25 +02:00			`return xdl_do_histogram_diff(mf1, mf2, xpp, xe);`

Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) {`

			`return -1;`
			`}`

			`/*`
			`* Allocate and setup K vectors to be used by the differential algorithm.`
			`* One is to store the forward path and one to store the backward path.`
			`*/`
			`ndiags = xe->xdf1.nreff + xe->xdf2.nreff + 3;`
			`if (!(kvd = (long ) xdl_malloc((2 ndiags + 2) * sizeof(long)))) {`

			`xdl_free_env(xe);`
			`return -1;`
			`}`
			`kvdf = kvd;`
			`kvdb = kvdf + ndiags;`
			`kvdf += xe->xdf2.nreff + 1;`
			`kvdb += xe->xdf2.nreff + 1;`

Clean-up trivially redundant diff. Also corrects the line numbers in unified output when using zero lines context. 2006-04-04 03:47:55 +02:00			`xenv.mxcost = xdl_bogosqrt(ndiags);`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`if (xenv.mxcost < XDL_MAX_COST_MIN)`
			`xenv.mxcost = XDL_MAX_COST_MIN;`
			`xenv.snake_cnt = XDL_SNAKE_CNT;`
			`xenv.heur_min = XDL_HEUR_MIN_COST;`

			`dd1.nrec = xe->xdf1.nreff;`
			`dd1.ha = xe->xdf1.ha;`
			`dd1.rchg = xe->xdf1.rchg;`
			`dd1.rindex = xe->xdf1.rindex;`
			`dd2.nrec = xe->xdf2.nreff;`
			`dd2.ha = xe->xdf2.ha;`
			`dd2.rchg = xe->xdf2.rchg;`
			`dd2.rindex = xe->xdf2.rindex;`

			`if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,`
			`kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {`

			`xdl_free(kvd);`
			`xdl_free_env(xe);`
			`return -1;`
			`}`

			`xdl_free(kvd);`

			`return 0;`
			`}`


			`static xdchange_t xdl_add_change(xdchange_t xscr, long i1, long i2, long chg1, long chg2) {`
			`xdchange_t *xch;`

			`if (!(xch = (xdchange_t *) xdl_malloc(sizeof(xdchange_t))))`
			`return NULL;`

			`xch->next = xscr;`
			`xch->i1 = i1;`
			`xch->i2 = i2;`
			`xch->chg1 = chg1;`
			`xch->chg2 = chg2;`

			`return xch;`
			`}`


xdiff: add xdl_merge() This new function implements the functionality of RCS merge, but in-memory. It returns < 0 on error, otherwise the number of conflicts. Finding the conflicting lines can be a very expensive task. You can control the eagerness of this algorithm: - a level value of 0 means that all overlapping changes are treated as conflicts, - a value of 1 means that if the overlapping changes are identical, it is not treated as a conflict. - If you set level to 2, overlapping changes will be analyzed, so that almost identical changes will not result in huge conflicts. Rather, only the conflicting lines will be shown inside conflict markers. With each increasing level, the algorithm gets slower, but more accurate. Note that the code for level 2 depends on the simple definition of mmfile_t specific to git, and therefore it will be harder to port that to LibXDiff. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-11-21 23:24:34 +01:00			`int xdl_change_compact(xdfile_t xdf, xdfile_t xdfo, long flags) {`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;`
			`char rchg = xdf->rchg, rchgo = xdfo->rchg;`
			`xrecord_t **recs = xdf->recs;`

			`/*`
			`* This is the same of what GNU diff does. Move back and forward`
			`* change groups for a consistent and pretty diff output. This also`
Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-07-10 07:50:18 +02:00			`* helps in finding joinable change groups and reduce the diff size.`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`*/`
			`for (ix = ixo = 0;;) {`
			`/*`
			`* Find the first changed line in the to-be-compacted file.`
			`* We need to keep track of both indexes, so if we find a`
			`* changed lines group on the other file, while scanning the`
			`* to-be-compacted file, we need to skip it properly. Note`
			`* that loops that are testing for changed lines on rchg* do`
			`* not need index bounding since the array is prepared with`
			`* a zero at position -1 and N.`
			`*/`
			`for (; ix < nrec && !rchg[ix]; ix++)`
			`while (rchgo[ixo++]);`
			`if (ix == nrec)`
			`break;`

			`/*`
			`* Record the start of a changed-group in the to-be-compacted file`
			`* and find the end of it, on both to-be-compacted and other file`
			`* indexes (ix and ixo).`
			`*/`
			`ixs = ix;`
			`for (ix++; rchg[ix]; ix++);`
			`for (; rchgo[ixo]; ixo++);`

			`do {`
			`grpsiz = ix - ixs;`

			`/*`
			`* If the line before the current change group, is equal to`
			`* the last line of the current change group, shift backward`
			`* the group.`
			`*/`
			`while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&`
Teach diff about -b and -w flags This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-14 17:40:23 +02:00			`xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`rchg[--ixs] = 1;`
			`rchg[--ix] = 0;`

			`/*`
			`* This change might have joined two change groups,`
			`* so we try to take this scenario in account by moving`
			`* the start index accordingly (and so the other-file`
			`* end-of-group index).`
			`*/`
			`for (; rchg[ixs - 1]; ixs--);`
			`while (rchgo[--ixo]);`
			`}`

			`/*`
			`* Record the end-of-group position in case we are matched`
			`* with a group of changes in the other file (that is, the`
Fix typos / spelling in comments Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-04-17 20:13:30 +02:00			`* change record before the end-of-group index in the other`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`* file is set).`
			`*/`
			`ixref = rchgo[ixo - 1] ? ix: nrec;`

			`/*`
			`* If the first line of the current change group, is equal to`
			`* the line next of the current change group, shift forward`
			`* the group.`
			`*/`
			`while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&`
Teach diff about -b and -w flags This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-14 17:40:23 +02:00			`xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`rchg[ixs++] = 0;`
			`rchg[ix++] = 1;`

			`/*`
			`* This change might have joined two change groups,`
			`* so we try to take this scenario in account by moving`
			`* the start index accordingly (and so the other-file`
			`* end-of-group index). Keep tracking the reference`
			`* index in case we are shifting together with a`
			`* corresponding group of changes in the other file.`
			`*/`
			`for (; rchg[ix]; ix++);`
			`while (rchgo[++ixo])`
			`ixref = ix;`
			`}`
			`} while (grpsiz != ix - ixs);`

			`/*`
			`* Try to move back the possibly merged group of changes, to match`
			`* the recorded postion in the other file.`
			`*/`
			`while (ixref < ix) {`
			`rchg[--ixs] = 1;`
			`rchg[--ix] = 0;`
			`while (rchgo[--ixo]);`
			`}`
			`}`

			`return 0;`
			`}`


Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`int xdl_build_script(xdfenv_t xe, xdchange_t *xscr) {`
			`xdchange_t cscr = NULL, xch;`
			`char rchg1 = xe->xdf1.rchg, rchg2 = xe->xdf2.rchg;`
			`long i1, i2, l1, l2;`

			`/*`
			`* Trivial. Collects "groups" of changes and creates an edit script.`
			`*/`
			`for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 \|\| i2 >= 0; i1--, i2--)`
			`if (rchg1[i1 - 1] \|\| rchg2[i2 - 1]) {`
			`for (l1 = i1; rchg1[i1 - 1]; i1--);`
			`for (l2 = i2; rchg2[i2 - 1]; i2--);`

			`if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {`
			`xdl_free_script(cscr);`
			`return -1;`
			`}`
			`cscr = xch;`
			`}`

			`*xscr = cscr;`

			`return 0;`
			`}`


			`void xdl_free_script(xdchange_t *xscr) {`
			`xdchange_t *xch;`

			`while ((xch = xscr) != NULL) {`
			`xscr = xscr->next;`
			`xdl_free(xch);`
			`}`
			`}`

xdiff: add hunk_func() Add a way to register a callback function that is gets passed the start line and line count of each hunk of a diff. Only standard types are used. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-09 22:20:55 +02:00			`static int xdl_call_hunk_func(xdfenv_t xe, xdchange_t xscr, xdemitcb_t *ecb,`
			`xdemitconf_t const *xecfg)`
			`{`
			`xdchange_t xch, xche;`

			`for (xch = xscr; xch; xch = xche->next) {`
			`xche = xdl_get_hunk(xch, xecfg);`
			`if (xecfg->hunk_func(xch->i1, xche->i1 + xche->chg1 - xch->i1,`
			`xch->i2, xche->i2 + xche->chg2 - xch->i2,`
			`ecb->priv) < 0)`
			`return -1;`
			`}`
			`return 0;`
			`}`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00
			`int xdl_diff(mmfile_t mf1, mmfile_t mf2, xpparam_t const *xpp,`
			`xdemitconf_t const xecfg, xdemitcb_t ecb) {`
			`xdchange_t *xscr;`
			`xdfenv_t xe;`
xdiff: remove emit_func() and xdi_diff_hunks() The functions are unused now, remove them. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-09 22:24:34 +02:00			`emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;`
xdiff: add hunk_func() Add a way to register a callback function that is gets passed the start line and line count of each hunk of a diff. Only standard types are used. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-05-09 22:20:55 +02:00
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00			`if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {`

			`return -1;`
			`}`
Teach diff about -b and -w flags This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-14 17:40:23 +02:00			`if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 \|\|`
			`xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 \|\|`
xdiff: post-process hunks to make them consistent. 2006-04-14 01:45:13 +02:00			`xdl_build_script(&xe, &xscr) < 0) {`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00
			`xdl_free_env(&xe);`
			`return -1;`
			`}`
			`if (xscr) {`
Allow alternate "low-level" emit function from xdl_diff For some users (e.g. git blame), getting textual patch output is just extra work, as they can get all the information they need from the low- level diff structures. Allow for an alternate low-level emit function to be defined to allow bypassing the textual patch generation; set xemitconf_t's emit_func member to enable this. The (void ()()) type is pretty ugly, but the alternative would be to include most of the private xdiff headers in xdiff.h to get the types required for the "proper" function prototype. Also, a (void ) won't work, as ANSI C doesn't allow a function pointer to be cast to an object pointer. Signed-off-by: Brian Downing <bdowning@lavos.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-25 15:30:54 +02:00			`if (ef(&xe, xscr, ecb, xecfg) < 0) {`
Use a real built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-25 05:13:22 +01:00
			`xdl_free_script(xscr);`
			`xdl_free_env(&xe);`
			`return -1;`
			`}`
			`xdl_free_script(xscr);`
			`}`
			`xdl_free_env(&xe);`

			`return 0;`
			`}`