mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-28 04:49:43 +01:00

336 lines

7.6 KiB

C

Raw Permalink Normal View History

global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-06-14 08:50:23 +02:00			`#define USE_THE_REPOSITORY_VARIABLE`

treewide: remove cache.h inclusion due to object-file.h changes Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-04-11 09:41:54 +02:00			`#include "git-compat-util.h"`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`#include "tmp-objdir.h"`
abspath.h: move absolute path functions from cache.h This is another step towards letting us remove the include of cache.h in strbuf.c. It does mean that we also need to add includes of abspath.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-03-21 07:25:58 +01:00			`#include "abspath.h"`
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`#include "chdir-notify.h"`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`#include "dir.h"`
environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-03-21 07:26:03 +01:00			`#include "environment.h"`
object-file.h: move declarations for object-file.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-04-11 09:41:53 +02:00			`#include "object-file.h"`
repository: remove unnecessary include of path.h This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-05-16 08:33:59 +02:00			`#include "path.h"`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`#include "string-list.h"`
			`#include "strbuf.h"`
strvec: rename files from argv-array to strvec This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '.c' '.h' \| xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:23:39 +02:00			`#include "strvec.h"`
tmp-objdir: quote paths we add to alternates Commit 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03) regressed pushes to repositories with colon (or semi-colon in Windows in them) because it adds the repository's main object directory to GIT_ALTERNATE_OBJECT_DIRECTORIES. The receiver interprets the colon as a delimiter, not as part of the path, and index-pack is unable to find objects which it needs to resolve deltas. The previous commit introduced a quoting mechanism for the alternates list; let's use it here to cover this case. We'll avoid quoting when we can, though. This alternate setup is also used when calling hooks, so it's possible that the user may call older git implementations which don't understand the quoting mechanism. By quoting only when necessary, this setup will continue to work unless the user _also_ has a repository whose path contains the delimiter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-12-12 20:53:55 +01:00			`#include "quote.h"`
object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-05-16 08:34:06 +02:00			`#include "object-store-ll.h"`
environment: make `get_object_directory()` accept a repository The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-12 13:29:30 +02:00			`#include "repository.h"`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`struct tmp_objdir {`
			`struct strbuf path;`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`struct strvec env;`
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`struct object_directory *prev_odb;`
			`int will_destroy;`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`};`

			`/*`
			`* Allow only one tmp_objdir at a time in a running process, which simplifies`
tmp-objdir: skip clean up when handling a signal In the tmp-objdir api, tmp_objdir_create will create a temporary directory but also register signal handlers responsible for removing the directory's contents and the directory itself. However, the function responsible for recursively removing the contents and directory, remove_dir_recurse() calls opendir(3) and closedir(3). This can be problematic because these functions allocate and free memory, which are not async-signal-safe functions. This can lead to deadlocks. One place we call tmp_objdir_create() is in git-receive-pack, where we create a temporary quarantine directory "incoming". Incoming objects will be written to this directory before they get moved to the object directory. We have observed this code leading to a deadlock: Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)): #0 __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80 <main_arena>) at ./lowlevellock.c:35 #1 0x00007f621baa635b in __GI___libc_malloc (bytes=bytes@entry=32816) at malloc.c:3064 #2 0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60, flags=0, close_fd=true, fd=5) at ../sysdeps/posix/opendir.c:118 #3 opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69 #4 __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:92 #5 0x0000557c19c77de1 in remove_dir_recurse () git#6 0x0000557c19d81a4f in remove_tmp_objdir_on_signal () #7 <signal handler called> git#8 _int_malloc (av=av@entry=0x7f621bbf8b80 <main_arena>, bytes=bytes@entry=7160) at malloc.c:4116 git#9 0x00007f621baa62c9 in __GI___libc_malloc (bytes=7160) at malloc.c:3066 git#10 0x00007f621bd1e987 in inflateInit2_ () from /opt/gitlab/embedded/lib/libz.so.1 git#11 0x0000557c19dbe5f4 in git_inflate_init () git#12 0x0000557c19cee02a in unpack_compressed_entry () git#13 0x0000557c19cf08cb in unpack_entry () git#14 0x0000557c19cf0f32 in packed_object_info () git#15 0x0000557c19cd68cd in do_oid_object_info_extended () git#16 0x0000557c19cd6e2b in read_object_file_extended () git#17 0x0000557c19cdec2f in parse_object () git#18 0x0000557c19c34977 in lookup_commit_reference_gently () git#19 0x0000557c19d69309 in mark_uninteresting () git#20 0x0000557c19d2d180 in do_for_each_repo_ref_iterator () git#21 0x0000557c19d21678 in for_each_ref () git#22 0x0000557c19d6a94f in assign_shallow_commits_to_refs () git#23 0x0000557c19bc02b2 in cmd_receive_pack () git#24 0x0000557c19b29fdd in handle_builtin () git#25 0x0000557c19b2a526 in cmd_main () git#26 0x0000557c19b28ea2 in main () Since we can't do the cleanup in a portable and signal-safe way, skip the cleanup when we're handling a signal. This means that when signal handling, the temporary directory may not get cleaned up properly. This is mitigated by b3cecf49ea (tmp-objdir: new API for creating temporary writable databases, 2021-12-06) which changed the default name and allows gc to clean up these temporary directories. In the event of a normal exit, we should still be cleaning up via the atexit() handler. Helped-by: Jeff King <peff@peff.net> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-09-30 22:47:11 +02:00			`* our atexit cleanup routines. It's doubtful callers will ever need`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`* more than one, and we can expand later if so. You can have many such`
			`* tmp_objdirs simultaneously in many processes, of course.`
			`*/`
			`static struct tmp_objdir *the_tmp_objdir;`

			`static void tmp_objdir_free(struct tmp_objdir *t)`
			`{`
			`strbuf_release(&t->path);`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`strvec_clear(&t->env);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`free(t);`
			`}`

tmp-objdir: skip clean up when handling a signal In the tmp-objdir api, tmp_objdir_create will create a temporary directory but also register signal handlers responsible for removing the directory's contents and the directory itself. However, the function responsible for recursively removing the contents and directory, remove_dir_recurse() calls opendir(3) and closedir(3). This can be problematic because these functions allocate and free memory, which are not async-signal-safe functions. This can lead to deadlocks. One place we call tmp_objdir_create() is in git-receive-pack, where we create a temporary quarantine directory "incoming". Incoming objects will be written to this directory before they get moved to the object directory. We have observed this code leading to a deadlock: Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)): #0 __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80 <main_arena>) at ./lowlevellock.c:35 #1 0x00007f621baa635b in __GI___libc_malloc (bytes=bytes@entry=32816) at malloc.c:3064 #2 0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60, flags=0, close_fd=true, fd=5) at ../sysdeps/posix/opendir.c:118 #3 opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69 #4 __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:92 #5 0x0000557c19c77de1 in remove_dir_recurse () git#6 0x0000557c19d81a4f in remove_tmp_objdir_on_signal () #7 <signal handler called> git#8 _int_malloc (av=av@entry=0x7f621bbf8b80 <main_arena>, bytes=bytes@entry=7160) at malloc.c:4116 git#9 0x00007f621baa62c9 in __GI___libc_malloc (bytes=7160) at malloc.c:3066 git#10 0x00007f621bd1e987 in inflateInit2_ () from /opt/gitlab/embedded/lib/libz.so.1 git#11 0x0000557c19dbe5f4 in git_inflate_init () git#12 0x0000557c19cee02a in unpack_compressed_entry () git#13 0x0000557c19cf08cb in unpack_entry () git#14 0x0000557c19cf0f32 in packed_object_info () git#15 0x0000557c19cd68cd in do_oid_object_info_extended () git#16 0x0000557c19cd6e2b in read_object_file_extended () git#17 0x0000557c19cdec2f in parse_object () git#18 0x0000557c19c34977 in lookup_commit_reference_gently () git#19 0x0000557c19d69309 in mark_uninteresting () git#20 0x0000557c19d2d180 in do_for_each_repo_ref_iterator () git#21 0x0000557c19d21678 in for_each_ref () git#22 0x0000557c19d6a94f in assign_shallow_commits_to_refs () git#23 0x0000557c19bc02b2 in cmd_receive_pack () git#24 0x0000557c19b29fdd in handle_builtin () git#25 0x0000557c19b2a526 in cmd_main () git#26 0x0000557c19b28ea2 in main () Since we can't do the cleanup in a portable and signal-safe way, skip the cleanup when we're handling a signal. This means that when signal handling, the temporary directory may not get cleaned up properly. This is mitigated by b3cecf49ea (tmp-objdir: new API for creating temporary writable databases, 2021-12-06) which changed the default name and allows gc to clean up these temporary directories. In the event of a normal exit, we should still be cleaning up via the atexit() handler. Helped-by: Jeff King <peff@peff.net> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-09-30 22:47:11 +02:00			`int tmp_objdir_destroy(struct tmp_objdir *t)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
			`int err;`

			`if (!t)`
			`return 0;`

			`if (t == the_tmp_objdir)`
			`the_tmp_objdir = NULL;`

tmp-objdir: skip clean up when handling a signal In the tmp-objdir api, tmp_objdir_create will create a temporary directory but also register signal handlers responsible for removing the directory's contents and the directory itself. However, the function responsible for recursively removing the contents and directory, remove_dir_recurse() calls opendir(3) and closedir(3). This can be problematic because these functions allocate and free memory, which are not async-signal-safe functions. This can lead to deadlocks. One place we call tmp_objdir_create() is in git-receive-pack, where we create a temporary quarantine directory "incoming". Incoming objects will be written to this directory before they get moved to the object directory. We have observed this code leading to a deadlock: Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)): #0 __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80 <main_arena>) at ./lowlevellock.c:35 #1 0x00007f621baa635b in __GI___libc_malloc (bytes=bytes@entry=32816) at malloc.c:3064 #2 0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60, flags=0, close_fd=true, fd=5) at ../sysdeps/posix/opendir.c:118 #3 opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69 #4 __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:92 #5 0x0000557c19c77de1 in remove_dir_recurse () git#6 0x0000557c19d81a4f in remove_tmp_objdir_on_signal () #7 <signal handler called> git#8 _int_malloc (av=av@entry=0x7f621bbf8b80 <main_arena>, bytes=bytes@entry=7160) at malloc.c:4116 git#9 0x00007f621baa62c9 in __GI___libc_malloc (bytes=7160) at malloc.c:3066 git#10 0x00007f621bd1e987 in inflateInit2_ () from /opt/gitlab/embedded/lib/libz.so.1 git#11 0x0000557c19dbe5f4 in git_inflate_init () git#12 0x0000557c19cee02a in unpack_compressed_entry () git#13 0x0000557c19cf08cb in unpack_entry () git#14 0x0000557c19cf0f32 in packed_object_info () git#15 0x0000557c19cd68cd in do_oid_object_info_extended () git#16 0x0000557c19cd6e2b in read_object_file_extended () git#17 0x0000557c19cdec2f in parse_object () git#18 0x0000557c19c34977 in lookup_commit_reference_gently () git#19 0x0000557c19d69309 in mark_uninteresting () git#20 0x0000557c19d2d180 in do_for_each_repo_ref_iterator () git#21 0x0000557c19d21678 in for_each_ref () git#22 0x0000557c19d6a94f in assign_shallow_commits_to_refs () git#23 0x0000557c19bc02b2 in cmd_receive_pack () git#24 0x0000557c19b29fdd in handle_builtin () git#25 0x0000557c19b2a526 in cmd_main () git#26 0x0000557c19b28ea2 in main () Since we can't do the cleanup in a portable and signal-safe way, skip the cleanup when we're handling a signal. This means that when signal handling, the temporary directory may not get cleaned up properly. This is mitigated by b3cecf49ea (tmp-objdir: new API for creating temporary writable databases, 2021-12-06) which changed the default name and allows gc to clean up these temporary directories. In the event of a normal exit, we should still be cleaning up via the atexit() handler. Helped-by: Jeff King <peff@peff.net> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-09-30 22:47:11 +02:00			`if (t->prev_odb)`
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`restore_primary_odb(t->prev_odb, t->path.buf);`

tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`err = remove_dir_recursively(&t->path, 0);`

tmp-objdir: skip clean up when handling a signal In the tmp-objdir api, tmp_objdir_create will create a temporary directory but also register signal handlers responsible for removing the directory's contents and the directory itself. However, the function responsible for recursively removing the contents and directory, remove_dir_recurse() calls opendir(3) and closedir(3). This can be problematic because these functions allocate and free memory, which are not async-signal-safe functions. This can lead to deadlocks. One place we call tmp_objdir_create() is in git-receive-pack, where we create a temporary quarantine directory "incoming". Incoming objects will be written to this directory before they get moved to the object directory. We have observed this code leading to a deadlock: Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)): #0 __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80 <main_arena>) at ./lowlevellock.c:35 #1 0x00007f621baa635b in __GI___libc_malloc (bytes=bytes@entry=32816) at malloc.c:3064 #2 0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60, flags=0, close_fd=true, fd=5) at ../sysdeps/posix/opendir.c:118 #3 opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69 #4 __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:92 #5 0x0000557c19c77de1 in remove_dir_recurse () git#6 0x0000557c19d81a4f in remove_tmp_objdir_on_signal () #7 <signal handler called> git#8 _int_malloc (av=av@entry=0x7f621bbf8b80 <main_arena>, bytes=bytes@entry=7160) at malloc.c:4116 git#9 0x00007f621baa62c9 in __GI___libc_malloc (bytes=7160) at malloc.c:3066 git#10 0x00007f621bd1e987 in inflateInit2_ () from /opt/gitlab/embedded/lib/libz.so.1 git#11 0x0000557c19dbe5f4 in git_inflate_init () git#12 0x0000557c19cee02a in unpack_compressed_entry () git#13 0x0000557c19cf08cb in unpack_entry () git#14 0x0000557c19cf0f32 in packed_object_info () git#15 0x0000557c19cd68cd in do_oid_object_info_extended () git#16 0x0000557c19cd6e2b in read_object_file_extended () git#17 0x0000557c19cdec2f in parse_object () git#18 0x0000557c19c34977 in lookup_commit_reference_gently () git#19 0x0000557c19d69309 in mark_uninteresting () git#20 0x0000557c19d2d180 in do_for_each_repo_ref_iterator () git#21 0x0000557c19d21678 in for_each_ref () git#22 0x0000557c19d6a94f in assign_shallow_commits_to_refs () git#23 0x0000557c19bc02b2 in cmd_receive_pack () git#24 0x0000557c19b29fdd in handle_builtin () git#25 0x0000557c19b2a526 in cmd_main () git#26 0x0000557c19b28ea2 in main () Since we can't do the cleanup in a portable and signal-safe way, skip the cleanup when we're handling a signal. This means that when signal handling, the temporary directory may not get cleaned up properly. This is mitigated by b3cecf49ea (tmp-objdir: new API for creating temporary writable databases, 2021-12-06) which changed the default name and allows gc to clean up these temporary directories. In the event of a normal exit, we should still be cleaning up via the atexit() handler. Helped-by: Jeff King <peff@peff.net> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-09-30 22:47:11 +02:00			`tmp_objdir_free(t);`
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`return err;`
			`}`

			`static void remove_tmp_objdir(void)`
			`{`
			`tmp_objdir_destroy(the_tmp_objdir);`
			`}`

log: clean unneeded objects during `log --remerge-diff` The --remerge-diff option will need to create new blobs and trees representing the "automatic merge" state. If one is traversing a long project history, one can easily get hundreds of thousands of loose objects generated during `log --remerge-diff`. However, none of those loose objects are needed after we have completed our diff operation; they can be summarily deleted. Add a new helper function to tmp_objdir to discard all the contained objects, and call it after each merge is handled. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-02-02 03:37:29 +01:00			`void tmp_objdir_discard_objects(struct tmp_objdir *t)`
			`{`
			`remove_dir_recursively(&t->path, REMOVE_DIR_KEEP_TOPLEVEL);`
			`}`

tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`/*`
			`* These env_* functions are for setting up the child environment; the`
			`* "replace" variant overrides the value of any existing variable with that`
			`* "key". The "append" variant puts our new value at the end of a list,`
			`* separated by PATH_SEP (which is what separate values in`
			`* GIT_ALTERNATE_OBJECT_DIRECTORIES).`
			`*/`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`static void env_append(struct strvec env, const char key, const char *val)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
tmp-objdir: quote paths we add to alternates Commit 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03) regressed pushes to repositories with colon (or semi-colon in Windows in them) because it adds the repository's main object directory to GIT_ALTERNATE_OBJECT_DIRECTORIES. The receiver interprets the colon as a delimiter, not as part of the path, and index-pack is unable to find objects which it needs to resolve deltas. The previous commit introduced a quoting mechanism for the alternates list; let's use it here to cover this case. We'll avoid quoting when we can, though. This alternate setup is also used when calling hooks, so it's possible that the user may call older git implementations which don't understand the quoting mechanism. By quoting only when necessary, this setup will continue to work unless the user _also_ has a repository whose path contains the delimiter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-12-12 20:53:55 +01:00			`struct strbuf quoted = STRBUF_INIT;`
			`const char *old;`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
tmp-objdir: quote paths we add to alternates Commit 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03) regressed pushes to repositories with colon (or semi-colon in Windows in them) because it adds the repository's main object directory to GIT_ALTERNATE_OBJECT_DIRECTORIES. The receiver interprets the colon as a delimiter, not as part of the path, and index-pack is unable to find objects which it needs to resolve deltas. The previous commit introduced a quoting mechanism for the alternates list; let's use it here to cover this case. We'll avoid quoting when we can, though. This alternate setup is also used when calling hooks, so it's possible that the user may call older git implementations which don't understand the quoting mechanism. By quoting only when necessary, this setup will continue to work unless the user _also_ has a repository whose path contains the delimiter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-12-12 20:53:55 +01:00			`/*`
			`* Avoid quoting if it's not necessary, for maximum compatibility`
			`* with older parsers which don't understand the quoting.`
			`*/`
			`if (*val == '"' \|\| strchr(val, PATH_SEP)) {`
			`strbuf_addch(&quoted, '"');`
			`quote_c_style(val, &quoted, NULL, 1);`
			`strbuf_addch(&quoted, '"');`
			`val = quoted.buf;`
			`}`

			`old = getenv(key);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`if (!old)`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`strvec_pushf(env, "%s=%s", key, val);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`else`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`strvec_pushf(env, "%s=%s%c%s", key, old, PATH_SEP, val);`
tmp-objdir: quote paths we add to alternates Commit 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03) regressed pushes to repositories with colon (or semi-colon in Windows in them) because it adds the repository's main object directory to GIT_ALTERNATE_OBJECT_DIRECTORIES. The receiver interprets the colon as a delimiter, not as part of the path, and index-pack is unable to find objects which it needs to resolve deltas. The previous commit introduced a quoting mechanism for the alternates list; let's use it here to cover this case. We'll avoid quoting when we can, though. This alternate setup is also used when calling hooks, so it's possible that the user may call older git implementations which don't understand the quoting mechanism. By quoting only when necessary, this setup will continue to work unless the user _also_ has a repository whose path contains the delimiter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-12-12 20:53:55 +01:00
			`strbuf_release(&quoted);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`

strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`static void env_replace(struct strvec env, const char key, const char *val)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`strvec_pushf(env, "%s=%s", key, val);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`

			`static int setup_tmp_objdir(const char *root)`
			`{`
			`char *path;`
			`int ret = 0;`

			`path = xstrfmt("%s/pack", root);`
			`ret = mkdir(path, 0777);`
			`free(path);`

			`return ret;`
			`}`

tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`struct tmp_objdir tmp_objdir_create(const char prefix)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
			`static int installed_handlers;`
			`struct tmp_objdir *t;`

			`if (the_tmp_objdir)`
Replace all die("BUG: ...") calls by BUG() ones In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-02 11:38:39 +02:00			`BUG("only one tmp_objdir can be used at a time");`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`t = xcalloc(1, sizeof(*t));`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`strbuf_init(&t->path, 0);`
strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-28 22:25:12 +02:00			`strvec_init(&t->env);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`/*`
			`* Use a string starting with tmp_ so that the builtin/prune.c code`
			`* can recognize any stale objdirs left behind by a crash and delete`
			`* them.`
			`*/`
environment: make `get_object_directory()` accept a repository The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-12 13:29:30 +02:00			`strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX",`
			`repo_get_object_directory(the_repository), prefix);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`if (!mkdtemp(t->path.buf)) {`
			`/* free, not destroy, as we never touched the filesystem */`
			`tmp_objdir_free(t);`
			`return NULL;`
			`}`

			`the_tmp_objdir = t;`
			`if (!installed_handlers) {`
			`atexit(remove_tmp_objdir);`
			`installed_handlers++;`
			`}`

			`if (setup_tmp_objdir(t->path.buf)) {`
			`tmp_objdir_destroy(t);`
			`return NULL;`
			`}`

			`env_append(&t->env, ALTERNATE_DB_ENVIRONMENT,`
environment: make `get_object_directory()` accept a repository The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-12 13:29:30 +02:00			`absolute_path(repo_get_object_directory(the_repository)));`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`env_replace(&t->env, DB_ENVIRONMENT, absolute_path(t->path.buf));`
tmp-objdir: put quarantine information in the environment The presence of the GIT_QUARANTINE_PATH variable lets any called programs know that they're operating in a temporary object directory (and where that directory is). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:18 +02:00			`env_replace(&t->env, GIT_QUARANTINE_ENVIRONMENT,`
			`absolute_path(t->path.buf));`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`return t;`
			`}`

			`/*`
			`* Make sure we copy packfiles and their associated metafiles in the correct`
			`* order. All of these ends_with checks are slightly expensive to do in`
			`* the midst of a sorting routine, but in practice it shouldn't matter.`
			`* We will have a relatively small number of packfiles to order, and loose`
			`* objects exit early in the first line.`
			`*/`
			`static int pack_copy_priority(const char *name)`
			`{`
			`if (!starts_with(name, "pack"))`
			`return 0;`
			`if (ends_with(name, ".keep"))`
			`return 1;`
			`if (ends_with(name, ".pack"))`
			`return 2;`
packfile: prepare for the existence of '.rev' files Specify the format of the on-disk reverse index 'pack-.rev' file, as well as prepare the code for the existence of such files. The reverse index maps from pack relative positions (i.e., an index into the array of object which is sorted by their offsets within the packfile) to their position within the 'pack-.idx' file. Today, this is done by building up a list of (off_t, uint32_t) tuples for each object (the off_t corresponding to that object's offset, and the uint32_t corresponding to its position in the index). To convert between pack and index position quickly, this array of tuples is radix sorted based on its offset. This has two major drawbacks: First, the in-memory cost scales linearly with the number of objects in a pack. Each 'struct revindex_entry' is sizeof(off_t) + sizeof(uint32_t) + padding bytes for a total of 16. To observe this, force Git to load the reverse index by, for e.g., running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking for a single object in a fresh clone of the kernel, Git needs to allocate 120+ MB of memory in order to hold the reverse index in memory. Second, the cost to sort also scales with the size of the pack. Luckily, this is a linear function since 'load_pack_revindex()' uses a radix sort, but this cost still must be paid once per pack per process. As an example, it takes ~60x longer to print the _size_ of an object as it does to print that entire object's _contents_: Benchmark #1: git.compile cat-file --batch <obj Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms] Range (min … max): 3.2 ms … 3.7 ms 726 runs Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms] Range (min … max): 193.7 ms … 224.4 ms 13 runs Instead, avoid computing and sorting the revindex once per process by writing it to a file when the pack itself is generated. The format is relatively straightforward. It contains an array of uint32_t's, the length of which is equal to the number of objects in the pack. The ith entry in this table contains the index position of the ith object in the pack, where "ith object in the pack" is determined by pack offset. One thing that the on-disk format does _not_ contain is the full (up to) eight-byte offset corresponding to each object. This is something that the in-memory revindex contains (it stores an off_t in 'struct revindex_entry' along with the same uint32_t that the on-disk format has). Omit it in the on-disk format, since knowing the index position for some object is sufficient to get a constant-time lookup in the pack-.idx file to ask for an object's offset within the pack. This trades off between the on-disk size of the 'pack-.rev' file for runtime to chase down the offset for some object. Even though the lookup is constant time, the constant is heavier, since it can potentially involve two pointer walks in v2 indexes (one to access the 4-byte offset table, and potentially a second to access the double wide offset table). Consider trying to map an object's pack offset to a relative position within that pack. In a cold-cache scenario, more page faults occur while switching between binary searching through the reverse index and searching through the .idx file for an object's offset. Sure enough, with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after 'sync'ing), printing out the entire object's contents is still marginally faster than printing its size: Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms] Range (min … max): 21.4 ms … 23.5 ms 41 runs Benchmark #2: git.compile cat-file --batch <obj >/dev/null Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms] Range (min … max): 15.6 ms … 18.2 ms 45 runs (Numbers taken in the kernel after cheating and using the next patch to generate a reverse index). There are a couple of approaches to improve cold cache performance not pursued here: - We could include the object offsets in the reverse index format. Predictably, this does result in fewer page faults, but it triples the size of the file, while simultaneously duplicating a ton of data already available in the .idx file. (This was the original way I implemented the format, and it did show `--batch-check='%(objectsize:disk)'` winning out against `--batch`.) On the other hand, this increase in size also results in a large block-cache footprint, which could potentially hurt other workloads. - We could store the mapping from pack to index position in more cache-friendly way, like constructing a binary search tree from the table and writing the values in breadth-first order. This would result in much better locality, but the price you pay is trading O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you can no longer directly index the table). So, neither of these approaches are taken here. (Thankfully, the format is versioned, so we are free to pursue these in the future.) But, cold cache performance likely isn't interesting outside of one-off cases like asking for the size of an object directly. In real-world usage, Git is often performing many operations in the revindex (i.e., asking about many objects rather than a single one). The trade-off is worth it, since we will avoid the vast majority of the cost of generating the revindex that the extra pointer chase will look like noise in the following patch's benchmarks. This patch describes the format and prepares callers (like in pack-revindex.c) to be able to read *.rev files once they exist. An implementation of the writer will appear in the next patch, and callers will gradually begin to start using the writer in the patches that follow after that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-26 00:37:14 +01:00			`if (ends_with(name, ".rev"))`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`return 3;`
packfile: prepare for the existence of '.rev' files Specify the format of the on-disk reverse index 'pack-.rev' file, as well as prepare the code for the existence of such files. The reverse index maps from pack relative positions (i.e., an index into the array of object which is sorted by their offsets within the packfile) to their position within the 'pack-.idx' file. Today, this is done by building up a list of (off_t, uint32_t) tuples for each object (the off_t corresponding to that object's offset, and the uint32_t corresponding to its position in the index). To convert between pack and index position quickly, this array of tuples is radix sorted based on its offset. This has two major drawbacks: First, the in-memory cost scales linearly with the number of objects in a pack. Each 'struct revindex_entry' is sizeof(off_t) + sizeof(uint32_t) + padding bytes for a total of 16. To observe this, force Git to load the reverse index by, for e.g., running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking for a single object in a fresh clone of the kernel, Git needs to allocate 120+ MB of memory in order to hold the reverse index in memory. Second, the cost to sort also scales with the size of the pack. Luckily, this is a linear function since 'load_pack_revindex()' uses a radix sort, but this cost still must be paid once per pack per process. As an example, it takes ~60x longer to print the _size_ of an object as it does to print that entire object's _contents_: Benchmark #1: git.compile cat-file --batch <obj Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms] Range (min … max): 3.2 ms … 3.7 ms 726 runs Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms] Range (min … max): 193.7 ms … 224.4 ms 13 runs Instead, avoid computing and sorting the revindex once per process by writing it to a file when the pack itself is generated. The format is relatively straightforward. It contains an array of uint32_t's, the length of which is equal to the number of objects in the pack. The ith entry in this table contains the index position of the ith object in the pack, where "ith object in the pack" is determined by pack offset. One thing that the on-disk format does _not_ contain is the full (up to) eight-byte offset corresponding to each object. This is something that the in-memory revindex contains (it stores an off_t in 'struct revindex_entry' along with the same uint32_t that the on-disk format has). Omit it in the on-disk format, since knowing the index position for some object is sufficient to get a constant-time lookup in the pack-.idx file to ask for an object's offset within the pack. This trades off between the on-disk size of the 'pack-.rev' file for runtime to chase down the offset for some object. Even though the lookup is constant time, the constant is heavier, since it can potentially involve two pointer walks in v2 indexes (one to access the 4-byte offset table, and potentially a second to access the double wide offset table). Consider trying to map an object's pack offset to a relative position within that pack. In a cold-cache scenario, more page faults occur while switching between binary searching through the reverse index and searching through the .idx file for an object's offset. Sure enough, with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after 'sync'ing), printing out the entire object's contents is still marginally faster than printing its size: Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms] Range (min … max): 21.4 ms … 23.5 ms 41 runs Benchmark #2: git.compile cat-file --batch <obj >/dev/null Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms] Range (min … max): 15.6 ms … 18.2 ms 45 runs (Numbers taken in the kernel after cheating and using the next patch to generate a reverse index). There are a couple of approaches to improve cold cache performance not pursued here: - We could include the object offsets in the reverse index format. Predictably, this does result in fewer page faults, but it triples the size of the file, while simultaneously duplicating a ton of data already available in the .idx file. (This was the original way I implemented the format, and it did show `--batch-check='%(objectsize:disk)'` winning out against `--batch`.) On the other hand, this increase in size also results in a large block-cache footprint, which could potentially hurt other workloads. - We could store the mapping from pack to index position in more cache-friendly way, like constructing a binary search tree from the table and writing the values in breadth-first order. This would result in much better locality, but the price you pay is trading O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you can no longer directly index the table). So, neither of these approaches are taken here. (Thankfully, the format is versioned, so we are free to pursue these in the future.) But, cold cache performance likely isn't interesting outside of one-off cases like asking for the size of an object directly. In real-world usage, Git is often performing many operations in the revindex (i.e., asking about many objects rather than a single one). The trade-off is worth it, since we will avoid the vast majority of the cost of generating the revindex that the extra pointer chase will look like noise in the following patch's benchmarks. This patch describes the format and prepares callers (like in pack-revindex.c) to be able to read *.rev files once they exist. An implementation of the writer will appear in the next patch, and callers will gradually begin to start using the writer in the patches that follow after that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-26 00:37:14 +01:00			`if (ends_with(name, ".idx"))`
			`return 4;`
			`return 5;`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`

			`static int pack_copy_cmp(const char a, const char b)`
			`{`
			`return pack_copy_priority(a) - pack_copy_priority(b);`
			`}`

			`static int read_dir_paths(struct string_list out, const char path)`
			`{`
			`DIR *dh;`
			`struct dirent *de;`

			`dh = opendir(path);`
			`if (!dh)`
			`return -1;`

			`while ((de = readdir(dh)))`
tmp-objdir: do not migrate files starting with '.' This avoids "." and "..", as we already do, but also leaves room for index-pack to store extra data in the quarantine area (e.g., for passing back any analysis to be read by the pre-receive hook). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:22 +02:00			`if (de->d_name[0] != '.')`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`string_list_append(out, de->d_name);`

			`closedir(dh);`
			`return 0;`
			`}`

finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`static int migrate_paths(struct strbuf src, struct strbuf dst,`
			`enum finalize_object_file_flags flags);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`static int migrate_one(struct strbuf src, struct strbuf dst,`
			`enum finalize_object_file_flags flags)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
			`struct stat st;`

			`if (stat(src->buf, &st) < 0)`
			`return -1;`
			`if (S_ISDIR(st.st_mode)) {`
			`if (!mkdir(dst->buf, 0777)) {`
			`if (adjust_shared_perm(dst->buf))`
			`return -1;`
			`} else if (errno != EEXIST)`
			`return -1;`
finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`return migrate_paths(src, dst, flags);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`
finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`return finalize_object_file_flags(src->buf, dst->buf, flags);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`

finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`static int is_loose_object_shard(const char *name)`
			`{`
			`return strlen(name) == 2 && isxdigit(name[0]) && isxdigit(name[1]);`
			`}`

			`static int migrate_paths(struct strbuf src, struct strbuf dst,`
			`enum finalize_object_file_flags flags)`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`{`
			`size_t src_len = src->len, dst_len = dst->len;`
			`struct string_list paths = STRING_LIST_INIT_DUP;`
			`int i;`
			`int ret = 0;`

			`if (read_dir_paths(&paths, src->buf) < 0)`
			`return -1;`
			`paths.cmp = pack_copy_cmp;`
			`string_list_sort(&paths);`

			`for (i = 0; i < paths.nr; i++) {`
			`const char *name = paths.items[i].string;`
finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`enum finalize_object_file_flags flags_copy = flags;`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`strbuf_addf(src, "/%s", name);`
			`strbuf_addf(dst, "/%s", name);`

finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`if (is_loose_object_shard(name))`
			`flags_copy \|= FOF_SKIP_COLLISION_CHECK;`

			`ret \|= migrate_one(src, dst, flags_copy);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`strbuf_setlen(src, src_len);`
			`strbuf_setlen(dst, dst_len);`
			`}`

			`string_list_clear(&paths, 0);`
			`return ret;`
			`}`

			`int tmp_objdir_migrate(struct tmp_objdir *t)`
			`{`
			`struct strbuf src = STRBUF_INIT, dst = STRBUF_INIT;`
			`int ret;`

			`if (!t)`
			`return 0;`

tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00			`if (t->prev_odb) {`
			`if (the_repository->objects->odb->will_destroy)`
			`BUG("migrating an ODB that was marked for destruction");`
			`restore_primary_odb(t->prev_odb, t->path.buf);`
			`t->prev_odb = NULL;`
			`}`

tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`strbuf_addbuf(&src, &t->path);`
environment: make `get_object_directory()` accept a repository The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-12 13:29:30 +02:00			`strbuf_addstr(&dst, repo_get_object_directory(the_repository));`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
finalize_object_file(): implement collision check We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2024-09-26 17:22:38 +02:00			`ret = migrate_paths(&src, &dst, 0);`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00
			`strbuf_release(&src);`
			`strbuf_release(&dst);`

			`tmp_objdir_destroy(t);`
			`return ret;`
			`}`

			`const char *tmp_objdir_env(const struct tmp_objdir t)`
			`{`
			`if (!t)`
			`return NULL;`
strvec: rename struct fields The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-07-29 02:37:20 +02:00			`return t->env.v;`
tmp-objdir: introduce API for temporary object directories Once objects are added to the object database by a process, they cannot easily be deleted, as we don't know what other processes may have started referencing them. We have to clean them up with git-gc, which will apply the usual reachability and grace-period checks. This patch provides an alternative: it helps callers create a temporary directory inside the object directory, and a temporary environment which can be passed to sub-programs to ask them to write there (the original object directory remains accessible as an alternate of the temporary one). See tmp-objdir.h for details on the API. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-03 22:49:11 +02:00			`}`

			`void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)`
			`{`
			`add_to_alternates_memory(t->path.buf);`
			`}`
tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-12-06 23:05:04 +01:00
			`void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)`
			`{`
			`if (t->prev_odb)`
			`BUG("the primary object database is already replaced");`
			`t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);`
			`t->will_destroy = will_destroy;`
			`}`

			`struct tmp_objdir *tmp_objdir_unapply_primary_odb(void)`
			`{`
			`if (!the_tmp_objdir \|\| !the_tmp_objdir->prev_odb)`
			`return NULL;`

			`restore_primary_odb(the_tmp_objdir->prev_odb, the_tmp_objdir->path.buf);`
			`the_tmp_objdir->prev_odb = NULL;`
			`return the_tmp_objdir;`
			`}`

			`void tmp_objdir_reapply_primary_odb(struct tmp_objdir t, const char old_cwd,`
			`const char *new_cwd)`
			`{`
			`char *path;`

			`path = reparent_relative_path(old_cwd, new_cwd, t->path.buf);`
			`strbuf_reset(&t->path);`
			`strbuf_addstr(&t->path, path);`
			`free(path);`
			`tmp_objdir_replace_primary_odb(t, t->will_destroy);`
			`}`