mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00

278 lines

5.9 KiB

C

Raw Normal View History

[PATCH] Compilation: zero-length array declaration. ISO C99 (and GCC 3.x or later) lets you write a flexible array at the end of a structure, like this: struct frotz { int xyzzy; char nitfol[]; /* more / }; GCC 2.95 and 2.96 let you to do this with "char nitfol[0]"; unfortunately this is not allowed by ISO C90. This declares such construct like this: struct frotz { int xyzzy; char nitfol[FLEX_ARRAY]; / more */ }; and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and empty for others. If you are using a C90 C compiler, you should be able to override this with CFLAGS=-DFLEX_ARRAY=1 from the command line of "make". Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-01-07 10:33:54 +01:00			`#include "cache.h"`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`#include "object.h"`
[PATCH] Add function to parse an object of unspecified type (take 2) This adds a function that parses an object from the database when we have to look up its actual type. It also checks the hash of the file, due to its heritage as part of fsck-cache. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-28 16:46:33 +02:00			`#include "blob.h"`
			`#include "tree.h"`
			`#include "commit.h"`
			`#include "tag.h"`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`static struct object **obj_hash;`
			`static int nr_objs, obj_hash_size;`
Abstract out accesses to object hash array There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 06:38:55 +02:00
			`unsigned int get_max_object_index(void)`
			`{`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`return obj_hash_size;`
Abstract out accesses to object hash array There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 06:38:55 +02:00			`}`

			`struct object *get_indexed_object(unsigned int idx)`
			`{`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`return obj_hash[idx];`
Abstract out accesses to object hash array There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 06:38:55 +02:00			`}`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
formalize typename(), and add its reverse type_from_string() Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:58 +01:00			`static const char *object_type_strings[] = {`
			`NULL, /* OBJ_NONE = 0 */`
			`"commit", /* OBJ_COMMIT = 1 */`
			`"tree", /* OBJ_TREE = 2 */`
			`"blob", /* OBJ_BLOB = 3 */`
			`"tag", /* OBJ_TAG = 4 */`
Shrink "struct object" a bit This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-15 01:45:13 +02:00			`};`

formalize typename(), and add its reverse type_from_string() Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:58 +01:00			`const char *typename(unsigned int type)`
			`{`
			`if (type >= ARRAY_SIZE(object_type_strings))`
			`return NULL;`
			`return object_type_strings[type];`
			`}`

			`int type_from_string(const char *str)`
			`{`
			`int i;`

			`for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)`
			`if (!strcmp(str, object_type_strings[i]))`
			`return i;`
			`die("invalid object type \"%s\"", str);`
			`}`

git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`static unsigned int hash_obj(struct object *obj, unsigned int n)`
			`{`
Fix type-punning issues In these two places we are casting part of our unsigned char sha1 array into an unsigned int, which violates GCCs strict-aliasing rules (and probably other compilers). Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-12 03:17:38 +02:00			`unsigned int hash;`
			`memcpy(&hash, obj->sha1, sizeof(unsigned int));`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`return hash % n;`
			`}`

			`static void insert_obj_hash(struct object obj, struct object *hash, unsigned int size)`
			`{`
Unify signedness in hashing calls Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-19 06:34:02 +02:00			`unsigned int j = hash_obj(obj, size);`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00
			`while (hash[j]) {`
			`j++;`
			`if (j >= size)`
			`j = 0;`
			`}`
			`hash[j] = obj;`
			`}`

Unify signedness in hashing calls Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-19 06:34:02 +02:00			`static unsigned int hashtable_index(const unsigned char *sha1)`
Use a hashtable for objects instead of a sorted list In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:57:57 +01:00			`{`
hashtable-based objects: minimum fixups. Calling hashtable_index from find_object before objs is created would result in division by zero failure. Avoid it. Also the given object name may not be aligned suitably for unsigned int; avoid dereferencing casted pointer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 03:51:19 +01:00			`unsigned int i;`
			`memcpy(&i, sha1, sizeof(unsigned int));`
Unify signedness in hashing calls Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-19 06:34:02 +02:00			`return i % obj_hash_size;`
Use a hashtable for objects instead of a sorted list In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:57:57 +01:00			`}`

git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`struct object lookup_object(const unsigned char sha1)`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`{`
Unify signedness in hashing calls Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-05-19 06:34:02 +02:00			`unsigned int i;`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`struct object *obj;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`if (!obj_hash)`
			`return NULL;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
hashtable-based objects: minimum fixups. Calling hashtable_index from find_object before objs is created would result in division by zero failure. Avoid it. Also the given object name may not be aligned suitably for unsigned int; avoid dereferencing casted pointer. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 03:51:19 +01:00			`i = hashtable_index(sha1);`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`while ((obj = obj_hash[i]) != NULL) {`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`if (!hashcmp(sha1, obj->sha1))`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`break;`
Use a hashtable for objects instead of a sorted list In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:57:57 +01:00			`i++;`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`if (i == obj_hash_size)`
Use a hashtable for objects instead of a sorted list In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-12 02:57:57 +01:00			`i = 0;`
			`}`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`return obj;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`}`

git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`static void grow_object_hash(void)`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`{`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`int i;`
			`int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;`
			`struct object **new_hash;`

Use xcalloc instead of calloc Signed-off-by: Jonas Fonseca <fonseca@diku.dk> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-28 02:26:07 +02:00			`new_hash = xcalloc(new_hash_size, sizeof(struct object *));`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`for (i = 0; i < obj_hash_size; i++) {`
			`struct object *obj = obj_hash[i];`
			`if (!obj)`
			`continue;`
			`insert_obj_hash(obj, new_hash, new_hash_size);`
			`}`
			`free(obj_hash);`
			`obj_hash = new_hash;`
			`obj_hash_size = new_hash_size;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`}`

Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-17 07:11:43 +02:00			`void create_object(const unsigned char sha1, int type, void *o)`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`{`
Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-17 07:11:43 +02:00			`struct object *obj = o;`

[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`obj->parsed = 0;`
			`obj->used = 0;`
Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-17 07:11:43 +02:00			`obj->type = type;`
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`obj->flags = 0;`
Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-23 08:49:00 +02:00			`hashcpy(obj->sha1, sha1);`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`if (obj_hash_size - 1 <= nr_objs * 2)`
			`grow_object_hash();`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00
git object hash cleanups This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-30 20:20:33 +02:00			`insert_obj_hash(obj, obj_hash, obj_hash_size);`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`nr_objs++;`
Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-17 07:11:43 +02:00			`return obj;`
[PATCH] Implementations of parsing functions This implements the parsing functions. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-18 20:39:48 +02:00			`}`

[PATCH] Object library enhancements Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-03 01:45:48 +02:00			`struct object lookup_unknown_object(const unsigned char sha1)`
			`{`
			`struct object *obj = lookup_object(sha1);`
Clean up object creation to use more common code This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-17 07:11:43 +02:00			`if (!obj)`
			`obj = create_object(sha1, OBJ_NONE, alloc_object_node());`
[PATCH] Object library enhancements Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-03 01:45:48 +02:00			`return obj;`
			`}`

convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`struct object parse_object_buffer(const unsigned char sha1, enum object_type type, unsigned long size, void buffer, int eaten_p)`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`{`
			`struct object *obj;`
			`int eaten = 0;`

Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`obj = NULL;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`if (type == OBJ_BLOB) {`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`struct blob *blob = lookup_blob(sha1);`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (blob) {`
parse_object_buffer: don't ignore errors from the object specific parsing functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-03 22:22:39 +01:00			`if (parse_blob_buffer(blob, buffer, size))`
			`return NULL;`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`obj = &blob->object;`
			`}`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`} else if (type == OBJ_TREE) {`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`struct tree *tree = lookup_tree(sha1);`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (tree) {`
			`obj = &tree->object;`
receive-pack, fetch-pack: reject bogus pack that records objects twice When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-17 07:04:13 +01:00			`if (!tree->buffer)`
			`tree->object.parsed = 0;`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (!tree->object.parsed) {`
parse_object_buffer: don't ignore errors from the object specific parsing functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-03 22:22:39 +01:00			`if (parse_tree_buffer(tree, buffer, size))`
			`return NULL;`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`eaten = 1;`
			`}`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`}`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`} else if (type == OBJ_COMMIT) {`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`struct commit *commit = lookup_commit(sha1);`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (commit) {`
parse_object_buffer: don't ignore errors from the object specific parsing functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-03 22:22:39 +01:00			`if (parse_commit_buffer(commit, buffer, size))`
			`return NULL;`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (!commit->buffer) {`
			`commit->buffer = buffer;`
			`eaten = 1;`
			`}`
			`obj = &commit->object;`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`}`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`} else if (type == OBJ_TAG) {`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`struct tag *tag = lookup_tag(sha1);`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`if (tag) {`
parse_object_buffer: don't ignore errors from the object specific parsing functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-02-03 22:22:39 +01:00			`if (parse_tag_buffer(tag, buffer, size))`
			`return NULL;`
Don't dereference NULL upon lookup failure. Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-12-21 11:56:32 +01:00			`obj = &tag->object;`
			`}`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`} else {`
Don't assume tree entries that are not dirs are blobs When scanning the trees in track_tree_refs() there is a "lazy" test that assumes that entries are either directories or files. Don't do that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-06 12:25:17 +02:00			`warning("object %s has unknown type id %d\n", sha1_to_hex(sha1), type);`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`obj = NULL;`
			`}`
Don't assume tree entries that are not dirs are blobs When scanning the trees in track_tree_refs() there is a "lazy" test that assumes that entries are either directories or files. Don't do that. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-06-06 12:25:17 +02:00			`if (obj && obj->type == OBJ_NONE)`
			`obj->type = type;`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`*eaten_p = eaten;`
			`return obj;`
			`}`

[PATCH] Anal retentive 'const unsigned char *sha1' Make 'sha1' parameters const where possible Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-03 17:05:39 +02:00			`struct object parse_object(const unsigned char sha1)`
[PATCH] Add function to parse an object of unspecified type (take 2) This adds a function that parses an object from the database when we have to look up its actual type. It also checks the hash of the file, due to its heritage as part of fsck-cache. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-28 16:46:33 +02:00			`{`
[PATCH] Remove "delta" object representation. Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:33:33 +02:00			`unsigned long size;`
convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-26 20:55:59 +01:00			`enum object_type type;`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`int eaten;`
read_sha1_file(): get rid of read_sha1_file_repl() madness Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-15 21:54:52 +02:00			`const unsigned char *repl = lookup_replace_object(sha1);`
parse_object: try internal cache before reading object db When parse_object is called, we do the following: 1. read the object data into a buffer via read_sha1_file 2. call parse_object_buffer, which then: a. calls the appropriate lookup_{commit,tree,blob,tag} to either create a new "struct object", or to find an existing one. We know the appropriate type from the lookup in step 1. b. calls the appropriate parse_{commit,tree,blob,tag} to parse the buffer for the new (or existing) object In step 2b, all of the called functions are no-ops for object "X" if "X->object.parsed" is set. I.e., when we have already parsed an object, we end up going to a lot of work just to find out at a low level that there is nothing left for us to do (and we throw away the data from read_sha1_file unread). We can optimize this by moving the check for "do we have an in-memory object" from 2a before the expensive call to read_sha1_file in step 1. This might seem circular, since step 2a uses the type information determined in step 1 to call the appropriate lookup function. However, we can notice that all of the lookup_* functions are backed by lookup_object. In other words, all of the objects are kept in a master hash table, and we don't actually need the type to do the "do we have it" part of the lookup, only to do the "and create it if it doesn't exist" part. This can save time whenever we call parse_object on the same sha1 twice in a single program. Some code paths already perform this optimization manually, with either: if (!obj->parsed) obj = parse_object(obj->sha1); if you already have a "struct object", or: struct object *obj = lookup_unknown_object(sha1); if (!obj \|\| !obj->parsed) obj = parse_object(sha1); if you don't. This patch moves the optimization into parse_object itself. Most git operations won't notice any impact. Either they don't parse a lot of duplicate sha1s, or the calling code takes special care not to re-parse objects. I timed two code paths that do benefit (there may be more, but these two were immediately obvious and easy to time). The first is fast-export, which calls parse_object on each object it outputs, like this: object = parse_object(sha1); if (!object) die(...); if (object->flags & SHOWN) return; which means that just to realize we have already shown an object, we will read the whole object from disk! With this patch, my best-of-five time for "fast-export --all" on git.git dropped from 26.3s to 21.3s. The second case is upload-pack, which will call parse_object for each advertised ref (because it needs to peel tags to show "^{}" entries). This doesn't matter for most repositories, because they don't have a lot of refs pointing to the same objects. However, if you have a big alternates repository with a shared object db for a number of child repositories, then the alternates repository will have duplicated refs representing each of its children. For example, GitHub's alternates repository for git.git has ~120,000 refs, of which only ~3200 are unique. The time for upload-pack to print its list of advertised refs dropped from 3.4s to 0.76s. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-05 22:00:01 +01:00			`void *buffer;`
			`struct object *obj;`

			`obj = lookup_object(sha1);`
			`if (obj && obj->parsed)`
			`return obj;`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00
parse_object: try internal cache before reading object db When parse_object is called, we do the following: 1. read the object data into a buffer via read_sha1_file 2. call parse_object_buffer, which then: a. calls the appropriate lookup_{commit,tree,blob,tag} to either create a new "struct object", or to find an existing one. We know the appropriate type from the lookup in step 1. b. calls the appropriate parse_{commit,tree,blob,tag} to parse the buffer for the new (or existing) object In step 2b, all of the called functions are no-ops for object "X" if "X->object.parsed" is set. I.e., when we have already parsed an object, we end up going to a lot of work just to find out at a low level that there is nothing left for us to do (and we throw away the data from read_sha1_file unread). We can optimize this by moving the check for "do we have an in-memory object" from 2a before the expensive call to read_sha1_file in step 1. This might seem circular, since step 2a uses the type information determined in step 1 to call the appropriate lookup function. However, we can notice that all of the lookup_* functions are backed by lookup_object. In other words, all of the objects are kept in a master hash table, and we don't actually need the type to do the "do we have it" part of the lookup, only to do the "and create it if it doesn't exist" part. This can save time whenever we call parse_object on the same sha1 twice in a single program. Some code paths already perform this optimization manually, with either: if (!obj->parsed) obj = parse_object(obj->sha1); if you already have a "struct object", or: struct object *obj = lookup_unknown_object(sha1); if (!obj \|\| !obj->parsed) obj = parse_object(sha1); if you don't. This patch moves the optimization into parse_object itself. Most git operations won't notice any impact. Either they don't parse a lot of duplicate sha1s, or the calling code takes special care not to re-parse objects. I timed two code paths that do benefit (there may be more, but these two were immediately obvious and easy to time). The first is fast-export, which calls parse_object on each object it outputs, like this: object = parse_object(sha1); if (!object) die(...); if (object->flags & SHOWN) return; which means that just to realize we have already shown an object, we will read the whole object from disk! With this patch, my best-of-five time for "fast-export --all" on git.git dropped from 26.3s to 21.3s. The second case is upload-pack, which will call parse_object for each advertised ref (because it needs to peel tags to show "^{}" entries). This doesn't matter for most repositories, because they don't have a lot of refs pointing to the same objects. However, if you have a big alternates repository with a shared object db for a number of child repositories, then the alternates repository will have duplicated refs representing each of its children. For example, GitHub's alternates repository for git.git has ~120,000 refs, of which only ~3200 are unique. The time for upload-pack to print its list of advertised refs dropped from 3.4s to 0.76s. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-01-05 22:00:01 +01:00			`buffer = read_sha1_file(sha1, &type, &size);`
[PATCH] Remove "delta" object representation. Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-06-27 12:33:33 +02:00			`if (buffer) {`
object: call "check_sha1_signature" with the replacement sha1 Otherwise we get a "sha1 mismatch" error for replaced objects. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-23 10:07:10 +01:00			`if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {`
fix memory leak in parse_object when check_sha1_signature fails When check_sha1_signature fails, program is not terminated: it prints an error message and returns NULL, so the buffer returned by read_sha1_file should be freed before. Signed-off-by: Carlos Rica <jasampler@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 03:46:22 +02:00			`free(buffer);`
object: call "check_sha1_signature" with the replacement sha1 Otherwise we get a "sha1 mismatch" error for replaced objects. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-23 10:07:10 +01:00			`error("sha1 mismatch %s\n", sha1_to_hex(repl));`
Don't ever return corrupt objects from "parse_object()" Looking at the SHA1 validation code due to the corruption that Alexander Litvinov is seeing under Cygwin, I notice that one of the most central places where we read objects, we actually do end up verifying the SHA1 of the result, but then we happily parse it anyway. And using "printf" to write the error message means that it not only can get lost, but will actually mess up stdout, and cause other strange and hard-to-debug failures downstream. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-03-20 18:05:20 +01:00			`return NULL;`
			`}`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00
parse_object: pass on the original sha1, not the replaced one Commit 0e87c36 (object: call "check_sha1_signature" with the replacement sha1) changed the first argument passed to parse_object_buffer() from "sha1" to "repl". With that change, the returned obj pointer has the replacement SHA1 in obj->sha1, not the original one. But when using lookup_commit() and then parse_commit() on a commit, we get an object pointer with the original sha1, but the commit content comes from the replacement commit. So the result we get from using parse_object() is different from the we get from using lookup_commit() followed by parse_commit(). It looks much simpler and safer to fix this inconsistency by passing "sha1" to parse_object_bufer() instead of "repl". The commit comment should be used to tell the the replacement commit is replacing another commit and why. So it should be easy to see that we have a replacement commit instead of an original one. And it is not a problem if the content of the commit is not consistent with the sha1 as cat-file piped to hash-object can be used to see the difference. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-03 22:51:53 +02:00			`obj = parse_object_buffer(sha1, type, size, buffer, &eaten);`
Add git-for-each-ref: helper for language bindings This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-09-15 22:30:02 +02:00			`if (!eaten)`
			`free(buffer);`
[PATCH] don't load and decompress objects twice with parse_object() It turns out that parse_object() is loading and decompressing given object to free it just before calling the specific object parsing function which does mmap and decompress the same object again. This patch introduces the ability to parse specific objects directly from a memory buffer. Without this patch, running git-fsck-cache on the kernel repositorytake: real 0m13.006s user 0m11.421s sys 0m1.218s With this patch applied: real 0m8.060s user 0m7.071s sys 0m0.710s The performance increase is significant, and this is kind of a prerequisite for sane delta object support with fsck. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-05-06 19:48:34 +02:00			`return obj;`
[PATCH] Add function to parse an object of unspecified type (take 2) This adds a function that parses an object from the database when we have to look up its actual type. It also checks the hash of the file, due to its heritage as part of fsck-cache. Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-04-28 16:46:33 +02:00			`}`
			`return NULL;`
			`}`
[PATCH] Object library enhancements Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-03 01:45:48 +02:00
			`struct object_list object_list_insert(struct object item,`
			`struct object_list **list_p)`
			`{`
			`struct object_list *new_list = xmalloc(sizeof(struct object_list));`
Fix whitespace issue in object.c Change some expanded tabs (spaces) to tabs in object.c. Signed-off-by: Jared Hance <jaredhance@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-05 21:36:33 +02:00			`new_list->item = item;`
			`new_list->next = *list_p;`
			`*list_p = new_list;`
			`return new_list;`
[PATCH] Object library enhancements Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2005-08-03 01:45:48 +02:00			`}`

			`int object_list_contains(struct object_list list, struct object obj)`
			`{`
			`while (list) {`
			`if (list->item == obj)`
			`return 1;`
			`list = list->next;`
			`}`
			`return 0;`
			`}`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00
			`void add_object_array(struct object obj, const char name, struct object_array *array)`
add add_object_array_with_mode Each object in struct object_array is extended with the mode. If not specified, S_IFINVALID is used. An object with an mode value can be added with add_object_array_with_mode. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 18:43:58 +02:00			`{`
			`add_object_array_with_mode(obj, name, array, S_IFINVALID);`
			`}`

			`void add_object_array_with_mode(struct object obj, const char name, struct object_array *array, unsigned mode)`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`{`
			`unsigned nr = array->nr;`
			`unsigned alloc = array->alloc;`
			`struct object_array_entry *objects = array->objects;`

			`if (nr >= alloc) {`
			`alloc = (alloc + 32) * 2;`
			`objects = xrealloc(objects, alloc * sizeof(*objects));`
			`array->alloc = alloc;`
			`array->objects = objects;`
			`}`
			`objects[nr].item = obj;`
			`objects[nr].name = name;`
add add_object_array_with_mode Each object in struct object_array is extended with the mode. If not specified, S_IFINVALID is used. An object with an mode value can be added with add_object_array_with_mode. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 18:43:58 +02:00			`objects[nr].mode = mode;`
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-20 02:42:35 +02:00			`array->nr = ++nr;`
			`}`
bundle: allow the same ref to be given more than once "git bundle create x master master" used to create a bundle that lists the same branch (master) twice. Cloning from such a bundle resulted in a needless warning "warning: Duplicated ref: refs/remotes/origin/master". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-18 07:27:08 +01:00
			`void object_array_remove_duplicates(struct object_array *array)`
			`{`
fix "bundle --stdin" segfault When passed an empty list, objects_array_remove_duplicates() corrupts it by changing the number of entries from 0 to 1. The problem lies in the condition of its main loop: for (ref = 0; ref < array->nr - 1; ref++) { The loop body manipulates the supplied object array. In the case of an empty array, it should not be doing anything at all. But array->nr is an unsigned quantity, so the code enters the loop, in particular increasing array->nr. Fix this by comparing (ref + 1 < array->nr) instead. This bug can be triggered by git bundle --stdin: $ echo HEAD \| git bundle create some.bundle --stdin’ Segmentation fault (core dumped) The list of commits to bundle appears to be empty because of another bug: by the time the revision-walking machinery gets to look at it, standard input has already been consumed by rev-list, so this function gets an empty list of revisions. After this patch, git bundle --stdin still does not work; it just doesn’t segfault any more. Reported-by: Joey Hess <joey@kitenet.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-19 10:03:40 +02:00			`unsigned int ref, src, dst;`
bundle: allow the same ref to be given more than once "git bundle create x master master" used to create a bundle that lists the same branch (master) twice. Cloning from such a bundle resulted in a needless warning "warning: Duplicated ref: refs/remotes/origin/master". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-18 07:27:08 +01:00			`struct object_array_entry *objects = array->objects;`

fix "bundle --stdin" segfault When passed an empty list, objects_array_remove_duplicates() corrupts it by changing the number of entries from 0 to 1. The problem lies in the condition of its main loop: for (ref = 0; ref < array->nr - 1; ref++) { The loop body manipulates the supplied object array. In the case of an empty array, it should not be doing anything at all. But array->nr is an unsigned quantity, so the code enters the loop, in particular increasing array->nr. Fix this by comparing (ref + 1 < array->nr) instead. This bug can be triggered by git bundle --stdin: $ echo HEAD \| git bundle create some.bundle --stdin’ Segmentation fault (core dumped) The list of commits to bundle appears to be empty because of another bug: by the time the revision-walking machinery gets to look at it, standard input has already been consumed by rev-list, so this function gets an empty list of revisions. After this patch, git bundle --stdin still does not work; it just doesn’t segfault any more. Reported-by: Joey Hess <joey@kitenet.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-19 10:03:40 +02:00			`for (ref = 0; ref + 1 < array->nr; ref++) {`
bundle: allow the same ref to be given more than once "git bundle create x master master" used to create a bundle that lists the same branch (master) twice. Cloning from such a bundle resulted in a needless warning "warning: Duplicated ref: refs/remotes/origin/master". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-18 07:27:08 +01:00			`for (src = ref + 1, dst = src;`
			`src < array->nr;`
			`src++) {`
			`if (!strcmp(objects[ref].name, objects[src].name))`
			`continue;`
			`if (src != dst)`
			`objects[dst] = objects[src];`
			`dst++;`
			`}`
			`array->nr = dst;`
			`}`
			`}`