mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00

257 lines

6.3 KiB

C

Raw Normal View History

[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`/*`
			`* diff-delta.c: generate a delta between two buffers`
			`*`
			`* Many parts of this file have been lifted from LibXDiff version 0.10.`
			`* http://www.xmailserver.org/xdiff-lib.html`
			`*`
			`* LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>`
			`* Copyright (C) 2003 Davide Libenzi`
			`*`
			`* Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.`
			`*`
			`* This file is free software; you can redistribute it and/or`
			`* modify it under the terms of the GNU Lesser General Public`
			`* License as published by the Free Software Foundation; either`
			`* version 2.1 of the License, or (at your option) any later version.`
			`*`
			`* Use of this within git automatically means that the LGPL`
			`* licensing gets turned into GPLv2 within this project.`
			`*/`

			`#include <stdlib.h>`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`#include <string.h>`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`#include "delta.h"`


diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`struct index {`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`const unsigned char *ptr;`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`struct index *next;`
			`};`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`static struct index ** delta_index(const unsigned char *buf,`
			`unsigned long bufsize,`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`unsigned long trg_bufsize,`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`unsigned int *hash_shift)`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`{`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`unsigned long hsize;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`unsigned int i, hshift, hlimit, *hash_count;`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`const unsigned char *data;`
			`struct index entry, *hash;`
			`void *mem;`

			`/* determine index hash size */`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`hsize = bufsize / 4;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`for (i = 8; (1 << i) < hsize && i < 24; i += 2);`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`hsize = 1 << i;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`hshift = (i - 8) / 2;`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`*hash_shift = hshift;`

			`/* allocate lookup index */`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`mem = malloc(hsize * sizeof(hash) + bufsize sizeof(*entry));`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`if (!mem)`
			`return NULL;`
			`hash = mem;`
			`entry = mem + hsize * sizeof(*hash);`
			`memset(hash, 0, hsize * sizeof(*hash));`

diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`/* allocate an array to count hash entries */`
			`hash_count = calloc(hsize, sizeof(*hash_count));`
			`if (!hash_count) {`
			`free(hash);`
			`return NULL;`
			`}`

			`/* then populate the index */`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`data = buf + bufsize - 2;`
			`while (data > buf) {`
			`entry->ptr = --data;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`i = data[0] ^ ((data[1] ^ (data[2] << hshift)) << hshift);`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`entry->next = hash[i];`
			`hash[i] = entry++;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`hash_count[i]++;`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`}`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`/*`
			`* Determine a limit on the number of entries in the same hash`
			`* bucket. This guard us against patological data sets causing`
			`* really bad hash distribution with most entries in the same hash`
			`* bucket that would bring us to O(m*n) computing costs (m and n`
			`* corresponding to reference and target buffer sizes).`
			`*`
			`* The more the target buffer is large, the more it is important to`
			`* have small entry lists for each hash buckets. With such a limit`
			`* the cost is bounded to something more like O(m+n).`
			`*/`
			`hlimit = (1 << 26) / trg_bufsize;`
			`if (hlimit < 16)`
			`hlimit = 16;`

			`/*`
			`* Now make sure none of the hash buckets has more entries than`
			`* we're willing to test. Otherwise we short-circuit the entry`
			`* list uniformly to still preserve a good repartition across`
			`* the reference buffer.`
			`*/`
			`for (i = 0; i < hsize; i++) {`
			`if (hash_count[i] < hlimit)`
			`continue;`
			`entry = hash[i];`
			`do {`
			`struct index *keep = entry;`
			`int skip = hash_count[i] / hlimit / 2;`
			`do {`
			`entry = entry->next;`
			`} while(--skip && entry);`
			`keep->next = entry;`
			`} while(entry);`
			`}`
			`free(hash_count);`

diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`return hash;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`}`

diff-delta: fold two special tests into one plus cleanups Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:41:41 +01:00			`/* provide the size of the copy opcode given the block offset and size */`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`#define COPYOP_SIZE(o, s) \`
			`(!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \`
			`!!(s & 0xff) + !!(s & 0xff00) + 1)`

diff-delta: fold two special tests into one plus cleanups Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:41:41 +01:00			`/* the maximum size for any opcode */`
			`#define MAX_OP_SIZE COPYOP_SIZE(0xffffffff, 0xffffffff)`

[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`void diff_delta(void from_buf, unsigned long from_size,`
			`void *to_buf, unsigned long to_size,`
Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens. 2005-06-26 04:30:20 +02:00			`unsigned long *delta_size,`
			`unsigned long max_size)`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`{`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`unsigned int i, outpos, outsize, inscnt, hash_shift;`
			`const unsigned char ref_data, ref_top, data, top;`
			`unsigned char *out;`
			`struct index entry, *hash;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`if (!from_size \|\| !to_size)`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`return NULL;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`hash = delta_index(from_buf, from_size, to_size, &hash_shift);`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`if (!hash)`
			`return NULL;`

[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`outpos = 0;`
			`outsize = 8192;`
diff-delta: fold two special tests into one plus cleanups Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:41:41 +01:00			`if (max_size && outsize >= max_size)`
			`outsize = max_size + MAX_OP_SIZE + 1;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`out = malloc(outsize);`
			`if (!out) {`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`free(hash);`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`return NULL;`
			`}`

small cleanup for diff-delta.c This patch removes unused remnants of the original xdiff source. No functional change. Possible tiny speed improvement. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-15 17:10:32 +01:00			`ref_data = from_buf;`
			`ref_top = from_buf + from_size;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`data = to_buf;`
			`top = to_buf + to_size;`

			`/* store reference buffer size */`
[PATCH] denser delta header encoding Since the delta data format is not tied to any actual git object anymore, now is the time to add a small improvement to the delta data header as it is been done for packed object header. This patch allows for reducing the delta header of about 2 bytes and makes for simpler code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 06:27:45 +02:00			`out[outpos++] = from_size;`
			`from_size >>= 7;`
			`while (from_size) {`
			`out[outpos - 1] \|= 0x80;`
			`out[outpos++] = from_size;`
			`from_size >>= 7;`
			`}`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00
			`/* store target buffer size */`
[PATCH] denser delta header encoding Since the delta data format is not tied to any actual git object anymore, now is the time to add a small improvement to the delta data header as it is been done for packed object header. This patch allows for reducing the delta header of about 2 bytes and makes for simpler code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-29 06:27:45 +02:00			`out[outpos++] = to_size;`
			`to_size >>= 7;`
			`while (to_size) {`
			`out[outpos - 1] \|= 0x80;`
			`out[outpos++] = to_size;`
			`to_size >>= 7;`
			`}`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00
			`inscnt = 0;`

diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`while (data < top) {`
			`unsigned int moff = 0, msize = 0;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`if (data + 3 <= top) {`
			`i = data[0] ^ ((data[1] ^ (data[2] << hash_shift)) << hash_shift);`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`for (entry = hash[i]; entry; entry = entry->next) {`
			`const unsigned char *ref = entry->ptr;`
			`const unsigned char *src = data;`
			`unsigned int ref_size = ref_top - ref;`
			`if (ref_size > top - src)`
			`ref_size = top - src;`
			`if (ref_size > 0x10000)`
			`ref_size = 0x10000;`
			`if (ref_size <= msize)`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`break;`
diff-delta: bound hash list length to avoid O(mn) behavior The diff-delta code can exhibit O(mn) behavior with some patological data set where most hash entries end up in the same hash bucket. The latest code rework reduced the block size making it particularly vulnerable to this issue, but the issue was always there and can be triggered regardless of the block size. This patch does two things: 1) the hashing has been reworked to offer a better distribution to atenuate the problem a bit, and 2) a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a bit more expensive on average, but the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-28 05:09:55 +01:00			`if (ref != src)`
			`continue;`
			`while (ref_size-- && ++src == ++ref);`
Revert "Revert "diff-delta: produce optimal pack data"" 2006-02-28 06:37:56 +01:00			`if (msize < ref - entry->ptr) {`
			`/* this is our best match so far */`
			`msize = ref - entry->ptr;`
			`moff = entry->ptr - ref_data;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`}`
			`}`
			`}`

			`if (!msize \|\| msize < COPYOP_SIZE(moff, msize)) {`
			`if (!inscnt)`
			`outpos++;`
			`out[outpos++] = *data++;`
			`inscnt++;`
			`if (inscnt == 0x7f) {`
			`out[outpos - inscnt - 1] = inscnt;`
			`inscnt = 0;`
			`}`
			`} else {`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`unsigned char *op;`

[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`if (inscnt) {`
			`out[outpos - inscnt - 1] = inscnt;`
			`inscnt = 0;`
			`}`

			`data += msize;`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`op = out + outpos++;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`i = 0x80;`

			`if (moff & 0xff) { out[outpos++] = moff; i \|= 0x01; }`
			`moff >>= 8;`
			`if (moff & 0xff) { out[outpos++] = moff; i \|= 0x02; }`
			`moff >>= 8;`
			`if (moff & 0xff) { out[outpos++] = moff; i \|= 0x04; }`
			`moff >>= 8;`
			`if (moff & 0xff) { out[outpos++] = moff; i \|= 0x08; }`

			`if (msize & 0xff) { out[outpos++] = msize; i \|= 0x10; }`
			`msize >>= 8;`
			`if (msize & 0xff) { out[outpos++] = msize; i \|= 0x20; }`

diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`*op = i;`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`}`

diff-delta: fold two special tests into one plus cleanups Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:41:41 +01:00			`if (outpos >= outsize - MAX_OP_SIZE) {`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`void *tmp = out;`
			`outsize = outsize * 3 / 2;`
diff-delta: fold two special tests into one plus cleanups Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:41:41 +01:00			`if (max_size && outsize >= max_size)`
			`outsize = max_size + MAX_OP_SIZE + 1;`
			`if (max_size && outpos > max_size)`
			`out = NULL;`
			`else`
			`out = realloc(out, outsize);`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`if (!out) {`
			`free(tmp);`
diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`free(hash);`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`return NULL;`
			`}`
			`}`
			`}`

			`if (inscnt)`
			`out[outpos - inscnt - 1] = inscnt;`

diff-delta: big code simplification This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-22 02:43:17 +01:00			`free(hash);`
[PATCH] Deltification library work by Nicolas Pitre. This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-19 16:27:14 +02:00			`*delta_size = outpos;`
			`return out;`
			`}`