mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 06:03:44 +01:00

192 lines

4 KiB

C

Raw Normal View History

git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`/*`
			`* csum-file.c`
			`*`
			`* Copyright (C) 2005 Linus Torvalds`
			`*`
			`* Simple file write infrastructure for writing SHA1-summed`
			`* files. Useful when you write a file that you want to be`
			`* able to verify hasn't been messed with afterwards.`
			`*/`
			`#include "cache.h"`
add throughput display to git-push This one triggers only when git-pack-objects is called with --all-progress and --stdout which is the combination used by git-push. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 22:06:21 +01:00			`#include "progress.h"`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`#include "csum-file.h"`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`static void flush(struct sha1file f, void buf, unsigned int count)`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`{`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (0 <= f->check_fd && count) {`
			`unsigned char check_buffer[8192];`
			`ssize_t ret = read_in_full(f->check_fd, check_buffer, count);`

			`if (ret < 0)`
			`die_errno("%s: sha1 file read error", f->name);`
			`if (ret < count)`
			`die("%s: sha1 file truncated", f->name);`
			`if (memcmp(buf, check_buffer, count))`
			`die("sha1 file '%s' validation error", f->name);`
			`}`

git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`for (;;) {`
xread/xwrite: do not worry about EINTR at calling sites. We had errno==EINTR check after read(2)/write(2) sprinkled all over the places, always doing continue. Consolidate them into xread()/xwrite() wrapper routines. Credits for suggestion goes to HPA -- bugs are mine. Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-12-20 01:18:28 +01:00			`int ret = xwrite(f->fd, buf, count);`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`if (ret > 0) {`
make display of total transferred more accurate The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:15:41 +01:00			`f->total += ret;`
			`display_throughput(f->tp, f->total);`
Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:09 +02:00			`buf = (char *) buf + ret;`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`count -= ret;`
			`if (count)`
			`continue;`
Make sha1flush void and remove conditional return. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-14 22:32:01 +02:00			`return;`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`}`
			`if (!ret)`
csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to. 2005-06-27 07:01:46 +02:00			`die("sha1 file '%s' write error. Out of diskspace", f->name);`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`die_errno("sha1 file '%s' write error", f->name);`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`}`
			`}`

fix pread()'s short read in index-pack Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for its ability to compute a SHA1 on the written data. This also provides data buffering which, along with commit 92392b4a45, will confuse pread() whenever an appended object is 1) freed due to memory pressure because of the depth-first delta processing, and 2) needed again because it has many delta children, and 3) its data is still buffered by sha1write(). Let's fix the issue by simply forcing cached data out when such an object is written so it can be pread()'d at leisure. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-10 04:08:51 +02:00			`void sha1flush(struct sha1file *f)`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`{`
			`unsigned offset = f->offset;`
Make pack creation always fsync() the result This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 17:42:16 +02:00
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`if (offset) {`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(&f->ctx, f->buffer, offset);`
Merge branch 'maint' * maint: rebase -i: do not fail when there is no commit to cherry-pick test-lib: fix color reset in say_color() fix pread()'s short read in index-pack Conflicts: csum-file.c 2008-10-10 17:39:20 +02:00			`flush(f, f->buffer, offset);`
Alter sha1close() 3rd argument to request flush only update=0 suppressed writing the final SHA-1 but was not used. Now final=0 suppresses SHA-1 finalization, SHA-1 writing, and closing -- in other words, only flush the buffer. Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-13 20:28:19 +02:00			`f->offset = 0;`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`}`
fix pread()'s short read in index-pack Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for its ability to compute a SHA1 on the written data. This also provides data buffering which, along with commit 92392b4a45, will confuse pread() whenever an appended object is 1) freed due to memory pressure because of the depth-first delta processing, and 2) needed again because it has many delta children, and 3) its data is still buffered by sha1write(). Let's fix the issue by simply forcing cached data out when such an object is written so it can be pread()'d at leisure. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-10 04:08:51 +02:00			`}`

			`int sha1close(struct sha1file f, unsigned char result, unsigned int flags)`
			`{`
			`int fd;`

			`sha1flush(f);`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Final(f->buffer, &f->ctx);`
pack-objects: use fixup_pack_header_footer()'s validation mode When limiting the pack size, a new header has to be written to the pack and a new SHA1 computed. Make sure that the SHA1 of what is being read back matches the SHA1 of what was written. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-08-29 22:08:00 +02:00			`if (result)`
			`hashcpy(result, f->buffer);`
Make pack creation always fsync() the result This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 17:42:16 +02:00			`if (flags & (CSUM_CLOSE \| CSUM_FSYNC)) {`
pack-objects.c: fix some global variable abuse and memory leaks To keep things well layered, sha1close() now returns the file descriptor when it doesn't close the file. An ugly cast was added to the return of write_idx_file() to avoid a warning. A proper fix will come separately. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:48 +02:00			`/* write checksum and close fd */`
Merge branch 'maint' * maint: rebase -i: do not fail when there is no commit to cherry-pick test-lib: fix color reset in say_color() fix pread()'s short read in index-pack Conflicts: csum-file.c 2008-10-10 17:39:20 +02:00			`flush(f, f->buffer, 20);`
Make pack creation always fsync() the result This means that we can depend on packs always being stable on disk, simplifying a lot of the object serialization worries. And unlike loose objects, serializing pack creation IO isn't going to be a performance killer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-30 17:42:16 +02:00			`if (flags & CSUM_FSYNC)`
			`fsync_or_die(f->fd, f->name);`
pack-objects.c: fix some global variable abuse and memory leaks To keep things well layered, sha1close() now returns the file descriptor when it doesn't close the file. An ugly cast was added to the return of write_idx_file() to avoid a warning. A proper fix will come separately. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:48 +02:00			`if (close(f->fd))`
Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-27 17:58:46 +02:00			`die_errno("%s: sha1 file error on close", f->name);`
pack-objects.c: fix some global variable abuse and memory leaks To keep things well layered, sha1close() now returns the file descriptor when it doesn't close the file. An ugly cast was added to the return of write_idx_file() to avoid a warning. A proper fix will come separately. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:48 +02:00			`fd = 0;`
			`} else`
			`fd = f->fd;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`if (0 <= f->check_fd) {`
			`char discard;`
			`int cnt = read_in_full(f->check_fd, &discard, 1);`
			`if (cnt < 0)`
			`die_errno("%s: error when reading the tail of sha1 file",`
			`f->name);`
			`if (cnt)`
			`die("%s: sha1 file has trailing garbage", f->name);`
			`if (close(f->check_fd))`
			`die_errno("%s: sha1 file error on close", f->name);`
			`}`
[PATCH] Plug memory leak in sha1close() sha1create() and sha1fd() malloc the returned struct sha1file; sha1close() should free it. Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-08 20:46:13 +02:00			`free(f);`
pack-objects.c: fix some global variable abuse and memory leaks To keep things well layered, sha1close() now returns the file descriptor when it doesn't close the file. An ugly cast was added to the return of write_idx_file() to avoid a warning. A proper fix will come separately. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-17 03:55:48 +02:00			`return fd;`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`}`

			`int sha1write(struct sha1file f, void buf, unsigned int count)`
			`{`
			`while (count) {`
			`unsigned offset = f->offset;`
			`unsigned left = sizeof(f->buffer) - offset;`
			`unsigned nr = count > left ? left : count;`
sha1write: don't copy full sized buffers No need to memcpy() source buffer data when we might just process the data in place instead of accumulating it into a separate buffer. This is the case when a whole buffer would have been copied, summed, written out and then discarded right away. Also move the CRC32 processing within the loop so the data is more likely to remain in the L1 CPU cache between the CRC32 sum, SHA1 sum and the write call. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-09-02 16:22:20 +02:00			`void *data;`

			`if (f->do_crc)`
			`f->crc32 = crc32(f->crc32, buf, nr);`

			`if (nr == sizeof(f->buffer)) {`
			`/* process full buffer directly without copy */`
			`data = buf;`
			`} else {`
			`memcpy(f->buffer + offset, buf, nr);`
			`data = f->buffer;`
			`}`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00
			`count -= nr;`
			`offset += nr;`
Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-18 17:18:09 +02:00			`buf = (char *) buf + nr;`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`left -= nr;`
			`if (!left) {`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Update(&f->ctx, data, offset);`
Merge branch 'maint' * maint: rebase -i: do not fail when there is no commit to cherry-pick test-lib: fix color reset in say_color() fix pread()'s short read in index-pack Conflicts: csum-file.c 2008-10-10 17:39:20 +02:00			`flush(f, data, offset);`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00			`offset = 0;`
			`}`
			`f->offset = offset;`
			`}`
			`return 0;`
			`}`

csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor We'll use this soon to write pack-files to stdout. 2005-06-28 20:10:06 +02:00			`struct sha1file sha1fd(int fd, const char name)`
add throughput display to git-push This one triggers only when git-pack-objects is called with --all-progress and --stdout which is the combination used by git-push. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 22:06:21 +01:00			`{`
			`return sha1fd_throughput(fd, name, NULL);`
			`}`

index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`struct sha1file sha1fd_check(const char name)`
			`{`
			`int sink, check;`
			`struct sha1file *f;`

			`sink = open("/dev/null", O_WRONLY);`
			`if (sink < 0)`
			`return NULL;`
			`check = open(name, O_RDONLY);`
			`if (check < 0) {`
			`int saved_errno = errno;`
			`close(sink);`
			`errno = saved_errno;`
			`return NULL;`
			`}`
			`f = sha1fd(sink, name);`
			`f->check_fd = check;`
			`return f;`
			`}`

add throughput display to git-push This one triggers only when git-pack-objects is called with --all-progress and --stdout which is the combination used by git-push. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 22:06:21 +01:00			`struct sha1file sha1fd_throughput(int fd, const char name, struct progress *tp)`
csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor We'll use this soon to write pack-files to stdout. 2005-06-28 20:10:06 +02:00			`{`
remove dead code from the csum-file interface The provided name argument is always constant and valid in every caller's context, so no need to have an array of PATH_MAX chars to copy it into when a simple pointer will do. Unfortunately that means getting rid of wascally wabbits too. The 'error' field is also unused. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:54:50 +01:00			`struct sha1file f = xmalloc(sizeof(f));`
csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor We'll use this soon to write pack-files to stdout. 2005-06-28 20:10:06 +02:00			`f->fd = fd;`
index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-02-03 02:29:01 +01:00			`f->check_fd = -1;`
csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor We'll use this soon to write pack-files to stdout. 2005-06-28 20:10:06 +02:00			`f->offset = 0;`
make display of total transferred more accurate The throughput display needs a delay period before accounting and displaying anything. Yet it might be called after some amount of data has already been transferred. The display of total data is therefore accounted late and therefore smaller than the reality. Let's call display_throughput() with an absolute amount of transferred data instead of a relative number, and let the throughput code find the relative amount of data by itself as needed. This way the displayed total is always exact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:15:41 +01:00			`f->total = 0;`
add throughput display to git-push This one triggers only when git-pack-objects is called with --all-progress and --stdout which is the combination used by git-push. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-10-30 22:06:21 +01:00			`f->tp = tp;`
remove dead code from the csum-file interface The provided name argument is always constant and valid in every caller's context, so no need to have an array of PATH_MAX chars to copy it into when a simple pointer will do. Unfortunately that means getting rid of wascally wabbits too. The 'error' field is also unused. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-11-05 04:54:50 +01:00			`f->name = name;`
compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:31 +02:00			`f->do_crc = 0;`
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-01 20:05:20 +02:00			`git_SHA1_Init(&f->ctx);`
csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor We'll use this soon to write pack-files to stdout. 2005-06-28 20:10:06 +02:00			`return f;`
			`}`

csum-file: introduce sha1file_checkpoint It is useful to be able to rewind a check-summed file to a certain previous state after writing data into it using sha1write() API. The fast-import command does this after streaming a blob data to the packfile being generated and then noticing that the same blob has already been written, and it does this with a private code truncate_pack() that is commented as "Yes, this is a layering violation". Introduce two API functions, sha1file_checkpoint(), that allows the caller to save a state of a sha1file, and then later revert it to the saved state. Use it to reimplement truncate_pack(). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-18 01:26:54 +01:00			`void sha1file_checkpoint(struct sha1file f, struct sha1file_checkpoint checkpoint)`
			`{`
			`sha1flush(f);`
			`checkpoint->offset = f->total;`
			`checkpoint->ctx = f->ctx;`
			`}`

			`int sha1file_truncate(struct sha1file f, struct sha1file_checkpoint checkpoint)`
			`{`
			`off_t offset = checkpoint->offset;`

			`if (ftruncate(f->fd, offset) \|\|`
			`lseek(f->fd, offset, SEEK_SET) != offset)`
			`return -1;`
			`f->total = offset;`
			`f->ctx = checkpoint->ctx;`
			`f->offset = 0; /* sha1flush() was called in checkpoint */`
			`return 0;`
			`}`

compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:31 +02:00			`void crc32_begin(struct sha1file *f)`
			`{`
sparse: Fix errors and silence warnings * load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-04-03 09:06:54 +02:00			`f->crc32 = crc32(0, NULL, 0);`
compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:31 +02:00			`f->do_crc = 1;`
			`}`
git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index. 2005-06-27 05:27:56 +02:00
compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-09 07:06:31 +02:00			`uint32_t crc32_end(struct sha1file *f)`
			`{`
			`f->do_crc = 0;`
			`return f->crc32;`
			`}`