mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-31 22:37:54 +01:00

1146 lines

30 KiB

C

Raw Normal View History

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`#include "cache.h"`
			`#include "config.h"`
commit-graph: fix UX issue when .lock file exists We use the lockfile API to avoid multiple Git processes from writing to the commit-graph file in the .git/objects/info directory. In some cases, this directory may not exist, so we check for its existence. The existing code does the following when acquiring the lock: 1. Try to acquire the lock. 2. If it fails, try to create the .git/object/info directory. 3. Try to acquire the lock, failing if necessary. The problem is that if the lockfile exists, then the mkdir fails, giving an error that doesn't help the user: "fatal: cannot mkdir .git/objects/info: File exists" While technically this honors the lockfile, it does not help the user. Instead, do the following: 1. Check for existence of .git/objects/info; create if necessary. 2. Try to acquire the lock, failing if necessary. The new output looks like: fatal: Unable to create '<dir>/.git/objects/info/commit-graph.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 19:42:52 +02:00			`#include "dir.h"`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`#include "git-compat-util.h"`
			`#include "lockfile.h"`
			`#include "pack.h"`
			`#include "packfile.h"`
			`#include "commit.h"`
			`#include "object.h"`
commit-graph: add '--reachable' option When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:45 +02:00			`#include "refs.h"`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`#include "revision.h"`
			`#include "sha1-lookup.h"`
			`#include "commit-graph.h"`
Merge branch 'ds/commit-graph' Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile() 2018-05-08 08:59:20 +02:00			`#include "object-store.h"`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`#include "alloc.h"`
commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`#include "hashmap.h"`
			`#include "replace-object.h"`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`#include "progress.h"`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
			`#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */`
			`#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */`
			`#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */`
			`#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */`
			`#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */`

			`#define GRAPH_DATA_WIDTH 36`

			`#define GRAPH_VERSION_1 0x1`
			`#define GRAPH_VERSION GRAPH_VERSION_1`

			`#define GRAPH_OID_VERSION_SHA1 1`
			`#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ`
			`#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1`
			`#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1`

			`#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000`
			`#define GRAPH_PARENT_MISSING 0x7fffffff`
			`#define GRAPH_EDGE_LAST_MASK 0x7fffffff`
			`#define GRAPH_PARENT_NONE 0x70000000`

			`#define GRAPH_LAST_EDGE 0x80000000`

commit-graph: fix GRAPH_MIN_SIZE The GRAPH_MIN_SIZE macro should be the smallest size of a parsable commit-graph file. However, the minimum number of chunks was wrong. It is possible to write a commit-graph file with zero commits, and that violates this macro's value. Rewrite the macro, and use extra macros to better explain the magic constants. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:28 +02:00			`#define GRAPH_HEADER_SIZE 8`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`#define GRAPH_FANOUT_SIZE (4 * 256)`
			`#define GRAPH_CHUNKLOOKUP_WIDTH 12`
commit-graph: fix GRAPH_MIN_SIZE The GRAPH_MIN_SIZE macro should be the smallest size of a parsable commit-graph file. However, the minimum number of chunks was wrong. It is possible to write a commit-graph file with zero commits, and that violates this macro's value. Rewrite the macro, and use extra macros to better explain the magic constants. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:28 +02:00			`#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \`
			`+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`char get_commit_graph_filename(const char obj_dir)`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`{`
			`return xstrfmt("%s/info/commit-graph", obj_dir);`
			`}`

commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`static struct commit_graph *alloc_commit_graph(void)`
			`{`
			`struct commit_graph g = xcalloc(1, sizeof(g));`
			`g->graph_fd = -1;`

			`return g;`
			`}`

commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`extern int read_replace_refs;`

			`static int commit_graph_compatible(struct repository *r)`
			`{`
commit-graph: not compatible with uninitialized repo Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:32 +02:00			`if (!r->gitdir)`
			`return 0;`

commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`if (read_replace_refs) {`
			`prepare_replace_object(r);`
			`if (hashmap_get_size(&r->objects->replace_map->map))`
			`return 0;`
			`}`

commit-graph: not compatible with grafts Augment commit_graph_compatible(r) to return false when the given repository r has commit grafts or is a shallow clone. Test that in these situations we ignore existing commit-graph files and we do not write new commit-graph files. Helped-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:30 +02:00			`prepare_commit_graft(r);`
			`if (r->parsed_objects && r->parsed_objects->grafts_nr)`
			`return 0;`
			`if (is_repository_shallow(r))`
			`return 0;`

commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`return 1;`
			`}`

commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`struct commit_graph load_commit_graph_one(const char graph_file)`
			`{`
			`void *graph_map;`
			`const unsigned char data, chunk_lookup;`
			`size_t graph_size;`
			`struct stat st;`
			`uint32_t i;`
			`struct commit_graph *graph;`
			`int fd = git_open(graph_file);`
			`uint64_t last_chunk_offset;`
			`uint32_t last_chunk_id;`
			`uint32_t graph_signature;`
			`unsigned char graph_version, hash_version;`

			`if (fd < 0)`
			`return NULL;`
			`if (fstat(fd, &st)) {`
			`close(fd);`
			`return NULL;`
			`}`
			`graph_size = xsize_t(st.st_size);`

			`if (graph_size < GRAPH_MIN_SIZE) {`
			`close(fd);`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`die(_("graph file %s is too small"), graph_file);`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`}`
			`graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);`
			`data = (const unsigned char *)graph_map;`

			`graph_signature = get_be32(data);`
			`if (graph_signature != GRAPH_SIGNATURE) {`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`error(_("graph signature %X does not match signature %X"),`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`graph_signature, GRAPH_SIGNATURE);`
			`goto cleanup_fail;`
			`}`

			`graph_version = (unsigned char)(data + 4);`
			`if (graph_version != GRAPH_VERSION) {`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`error(_("graph version %X does not match version %X"),`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`graph_version, GRAPH_VERSION);`
			`goto cleanup_fail;`
			`}`

			`hash_version = (unsigned char)(data + 5);`
			`if (hash_version != GRAPH_OID_VERSION) {`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`error(_("hash version %X does not match version %X"),`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`hash_version, GRAPH_OID_VERSION);`
			`goto cleanup_fail;`
			`}`

			`graph = alloc_commit_graph();`

			`graph->hash_len = GRAPH_OID_LEN;`
			`graph->num_chunks = (unsigned char)(data + 6);`
			`graph->graph_fd = fd;`
			`graph->data = graph_map;`
			`graph->data_len = graph_size;`

			`last_chunk_id = 0;`
			`last_chunk_offset = 8;`
			`chunk_lookup = data + 8;`
			`for (i = 0; i < graph->num_chunks; i++) {`
			`uint32_t chunk_id = get_be32(chunk_lookup + 0);`
			`uint64_t chunk_offset = get_be64(chunk_lookup + 4);`
			`int chunk_repeated = 0;`

			`chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;`

			`if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`(uint32_t)chunk_offset);`
			`goto cleanup_fail;`
			`}`

			`switch (chunk_id) {`
			`case GRAPH_CHUNKID_OIDFANOUT:`
			`if (graph->chunk_oid_fanout)`
			`chunk_repeated = 1;`
			`else`
			`graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);`
			`break;`

			`case GRAPH_CHUNKID_OIDLOOKUP:`
			`if (graph->chunk_oid_lookup)`
			`chunk_repeated = 1;`
			`else`
			`graph->chunk_oid_lookup = data + chunk_offset;`
			`break;`

			`case GRAPH_CHUNKID_DATA:`
			`if (graph->chunk_commit_data)`
			`chunk_repeated = 1;`
			`else`
			`graph->chunk_commit_data = data + chunk_offset;`
			`break;`

			`case GRAPH_CHUNKID_LARGEEDGES:`
			`if (graph->chunk_large_edges)`
			`chunk_repeated = 1;`
			`else`
			`graph->chunk_large_edges = data + chunk_offset;`
			`break;`
			`}`

			`if (chunk_repeated) {`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`error(_("chunk id %08x appears multiple times"), chunk_id);`
commit-graph: implement git commit-graph read Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:02 +02:00			`goto cleanup_fail;`
			`}`

			`if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)`
			`{`
			`graph->num_commits = (chunk_offset - last_chunk_offset)`
			`/ graph->hash_len;`
			`}`

			`last_chunk_id = chunk_id;`
			`last_chunk_offset = chunk_offset;`
			`}`

			`return graph;`

			`cleanup_fail:`
			`munmap(graph_map, graph_size);`
			`close(fd);`
			`exit(1);`
			`}`

commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`static void prepare_commit_graph_one(struct repository r, const char obj_dir)`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`{`
			`char *graph_name;`

commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`if (r->objects->commit_graph)`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`return;`

			`graph_name = get_commit_graph_filename(obj_dir);`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`r->objects->commit_graph =`
commit-graph: store graph in struct object_store Instead of storing commit graphs in static variables, store them in struct object_store. There are no changes to the signatures of existing functions - they all still only support the_repository, and support for other instances of struct repository will be added in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:41 +02:00			`load_commit_graph_one(graph_name);`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00
			`FREE_AND_NULL(graph_name);`
			`}`

commit-graph: refactor preparing commit graph Two functions in the code (1) check if the repository is configured for commit graphs, (2) call prepare_commit_graph(), and (3) check if the graph exists. Move (1) and (3) into prepare_commit_graph(), reducing duplication of code. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:37 +02:00			`/*`
			`* Return 1 if commit_graph is non-NULL, and 0 otherwise.`
			`*`
			`* On the first invocation, this function attemps to load the commit`
			`* graph if the_repository is configured to have one.`
			`*/`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`static int prepare_commit_graph(struct repository *r)`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`{`
			`struct alternate_object_database *alt;`
			`char *obj_dir;`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`int config_value;`

			`if (r->objects->commit_graph_attempted)`
			`return !!r->objects->commit_graph;`
			`r->objects->commit_graph_attempted = 1;`

commit-graph: define GIT_TEST_COMMIT_GRAPH The commit-graph feature is tested in isolation by t5318-commit-graph.sh and t6600-test-reach.sh, but there are many more interesting scenarios involving commit walks. Many of these scenarios are covered by the existing test suite, but we need to maintain coverage when the optional commit-graph structure is not present. To allow running the full test suite with the commit-graph present, add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try to load the commit-graph when parsing commits, and writes the commit-graph file after every 'git commit' command. There are a few tests that rely on commits not existing in pack-files to trigger important events, so manually set GIT_TEST_COMMIT_GRAPH to false for the necessary commands. There is one test in t6024-recursive-merge.sh that relies on the merge-base algorithm picking one of two ambiguous merge-bases, and the commit-graph feature changes which merge-base is picked. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-29 14:49:04 +02:00			`if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&`
			`(repo_config_get_bool(r, "core.commitgraph", &config_value) \|\|`
			`!config_value))`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`/*`
			`* This repository is not configured to use commit graphs, so`
			`* do not load one. (But report commit_graph_attempted anyway`
			`* so that commit graph loading is not attempted again for this`
			`* repository.)`
			`*/`
commit-graph: refactor preparing commit graph Two functions in the code (1) check if the repository is configured for commit graphs, (2) call prepare_commit_graph(), and (3) check if the graph exists. Move (1) and (3) into prepare_commit_graph(), reducing duplication of code. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:37 +02:00			`return 0;`

commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`if (!commit_graph_compatible(r))`
			`return 0;`

commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`obj_dir = r->objects->objectdir;`
			`prepare_commit_graph_one(r, obj_dir);`
			`prepare_alt_odb(r);`
			`for (alt = r->objects->alt_odb_list;`
			`!r->objects->commit_graph && alt;`
Merge branch 'ds/commit-graph' Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile() 2018-05-08 08:59:20 +02:00			`alt = alt->next)`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`prepare_commit_graph_one(r, alt->path);`
			`return !!r->objects->commit_graph;`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`}`

commit-reach: use can_all_from_reach The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list. This had two performance problems: 1. The performance is quadratic in worst-case. 2. A single in_merge_bases() call requires walking beyond the target commit in order to find the full set of boundary commits that may be merge-bases. The can_all_from_reach method avoids this quadratic behavior and can limit the search beyond the target commits using generation numbers. It requires a small prototype adjustment to stop using commit-date as a cutoff, as that optimization is no longer appropriate here. Since in_merge_bases() uses paint_down_to_common(), is_descendant_of() naturally found cutoffs to avoid walking the entire commit graph. Since we want to always return the correct result, we cannot use the min_commit_date cutoff in can_all_from_reach. We then rely on generation numbers to provide the cutoff. Since not all repos will have a commit-graph file, nor will we always have generation numbers computed for a commit-graph file, create a new method, generation_numbers_enabled(), that checks for a commit-graph file and sees if the first commit in the file has a non-zero generation number. In the case that we do not have generation numbers, use the old logic for is_descendant_of(). Performance was meausured on a copy of the Linux repository using the 'test-tool reach is_descendant_of' command using this input: A:v4.9 X:v4.10 X:v4.11 X:v4.12 X:v4.13 X:v4.14 X:v4.15 X:v4.16 X:v4.17 X.v3.0 Note that this input is tailored to demonstrate the quadratic nature of the previous method, as it will compute merge-bases for v4.9 versus all of the later versions before checking against v4.1. Before: 0.26 s After: 0.21 s Since we previously used the is_descendant_of method in the ref_newer method, we also measured performance there using 'test-tool reach ref_newer' with this input: A:v4.9 B:v3.19 Before: 0.10 s After: 0.08 s By adding a new commit with parent v3.19, we test the non-reachable case of ref_newer: Before: 0.09 s After: 0.08 s Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-20 18:33:30 +02:00			`int generation_numbers_enabled(struct repository *r)`
			`{`
			`uint32_t first_generation;`
			`struct commit_graph *g;`
			`if (!prepare_commit_graph(r))`
			`return 0;`

			`g = r->objects->commit_graph;`

			`if (!g->num_commits)`
			`return 0;`

			`first_generation = get_be32(g->chunk_commit_data +`
			`g->hash_len + 8) >> 2;`

			`return !!first_generation;`
			`}`

commit-graph: close_commit_graph before shallow walk Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:34 +02:00			`void close_commit_graph(struct repository *r)`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`{`
commit-graph: close_commit_graph before shallow walk Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:34 +02:00			`free_commit_graph(r->objects->commit_graph);`
			`r->objects->commit_graph = NULL;`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`}`

			`static int bsearch_graph(struct commit_graph g, struct object_id oid, uint32_t *pos)`
			`{`
			`return bsearch_hash(oid->hash, g->chunk_oid_fanout,`
			`g->chunk_oid_lookup, g->hash_len, pos);`
			`}`

			`static struct commit_list *insert_parent_or_die(struct commit_graph g,`
			`uint64_t pos,`
			`struct commit_list **pptr)`
			`{`
			`struct commit *c;`
			`struct object_id oid;`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00
commit-graph: verify parent list The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0xFFFFFFFF to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:38 +02:00			`if (pos >= g->num_commits)`
			`die("invalid parent position %"PRIu64, pos);`

commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);`
commit: add repository argument to lookup_commit Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:59 +02:00			`c = lookup_commit(the_repository, &oid);`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`if (!c)`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`die(_("could not find commit %s"), oid_to_hex(&oid));`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`c->graph_pos = pos;`
			`return &commit_list_insert(c, pptr)->next;`
			`}`

commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`static void fill_commit_graph_info(struct commit item, struct commit_graph g, uint32_t pos)`
			`{`
			`const unsigned char commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH pos;`
			`item->graph_pos = pos;`
			`item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;`
			`}`

commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`static int fill_commit_in_graph(struct commit item, struct commit_graph g, uint32_t pos)`
			`{`
			`uint32_t edge_value;`
			`uint32_t *parent_data_ptr;`
			`uint64_t date_low, date_high;`
			`struct commit_list **pptr;`
			`const unsigned char commit_data = g->chunk_commit_data + (g->hash_len + 16) pos;`

			`item->object.parsed = 1;`
			`item->graph_pos = pos;`

commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00			`item->maybe_tree = NULL;`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00
			`date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;`
			`date_low = get_be32(commit_data + g->hash_len + 12);`
			`item->date = (timestamp_t)((date_high << 32) \| date_low);`

commit: add generation number to struct commit The generation number of a commit is defined recursively as follows: * If a commit A has no parents, then the generation number of A is one. * If a commit A has parents, then the generation number of A is one more than the maximum generation number among the parents of A. Add a uint32_t generation field to struct commit so we can pass this information to revision walks. We use three special values to signal the generation number is invalid: GENERATION_NUMBER_INFINITY 0xFFFFFFFF GENERATION_NUMBER_MAX 0x3FFFFFFF GENERATION_NUMBER_ZERO 0 The first (_INFINITY) means the generation number has not been loaded or computed. The second (_MAX) means the generation number is too large to store in the commit-graph file. The third (_ZERO) means the generation number was loaded from a commit graph file that was written by a version of git that did not support generation numbers. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-25 16:37:55 +02:00			`item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;`

commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`pptr = &item->parents;`

			`edge_value = get_be32(commit_data + g->hash_len);`
			`if (edge_value == GRAPH_PARENT_NONE)`
			`return 1;`
			`pptr = insert_parent_or_die(g, edge_value, pptr);`

			`edge_value = get_be32(commit_data + g->hash_len + 4);`
			`if (edge_value == GRAPH_PARENT_NONE)`
			`return 1;`
			`if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {`
			`pptr = insert_parent_or_die(g, edge_value, pptr);`
			`return 1;`
			`}`

			`parent_data_ptr = (uint32_t*)(g->chunk_large_edges +`
			`4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));`
			`do {`
			`edge_value = get_be32(parent_data_ptr);`
			`pptr = insert_parent_or_die(g,`
			`edge_value & GRAPH_EDGE_LAST_MASK,`
			`pptr);`
			`parent_data_ptr++;`
			`} while (!(edge_value & GRAPH_LAST_EDGE));`

			`return 1;`
			`}`

commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`static int find_commit_in_graph(struct commit item, struct commit_graph g, uint32_t *pos)`
			`{`
			`if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {`
			`*pos = item->graph_pos;`
			`return 1;`
			`} else {`
			`return bsearch_graph(g, &(item->object.oid), pos);`
			`}`
			`}`

commit-graph: parse commit from chosen graph Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:29 +02:00			`static int parse_commit_in_graph_one(struct commit_graph g, struct commit item)`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`{`
commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`uint32_t pos;`

commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`if (item->object.parsed)`
			`return 1;`
commit-graph: parse commit from chosen graph Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:29 +02:00
			`if (find_commit_in_graph(item, g, &pos))`
			`return fill_commit_in_graph(item, g, pos);`

			`return 0;`
			`}`

commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`int parse_commit_in_graph(struct repository r, struct commit item)`
commit-graph: parse commit from chosen graph Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:29 +02:00			`{`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`if (!prepare_commit_graph(r))`
commit-graph: parse commit from chosen graph Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:29 +02:00			`return 0;`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`return parse_commit_in_graph_one(r->objects->commit_graph, item);`
commit: integrate commit graph with commit parsing Teach Git to inspect a commit graph file to supply the contents of a struct commit when calling parse_commit_gently(). This implementation satisfies all post-conditions on the struct commit, including loading parents, the root tree, and the commit date. If core.commitGraph is false, then do not check graph files. In test script t5318-commit-graph.sh, add output-matching conditions on read-only graph operations. By loading commits from the graph instead of parsing commit buffers, we save a lot of time on long commit walks. Here are some performance results for a copy of the Linux repository where 'master' has 678,653 reachable commits and is behind 'origin/master' by 59,929 commits. \| Command \| Before \| After \| Rel % \| \|----------------------------------\|--------\|--------\|-------\| \| log --oneline --topo-order -1000 \| 8.31s \| 0.94s \| -88% \| \| branch -vv \| 1.02s \| 0.14s \| -86% \| \| rev-list --all \| 5.89s \| 1.07s \| -81% \| \| rev-list --all --objects \| 66.15s \| 58.45s \| -11% \| Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:05 +02:00			`}`

commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`void load_commit_graph_info(struct repository r, struct commit item)`
commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`{`
			`uint32_t pos;`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`if (!prepare_commit_graph(r))`
commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`return;`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`if (find_commit_in_graph(item, r->objects->commit_graph, &pos))`
			`fill_commit_graph_info(item, r->objects->commit_graph, pos);`
commit-graph: always load commit-graph information Most code paths load commits using lookup_commit() and then parse_commit(). In some cases, including some branch lookups, the commit is parsed using parse_object_buffer() which side-steps parse_commit() in favor of parse_commit_buffer(). With generation numbers in the commit-graph, we need to ensure that any commit that exists in the commit-graph file has its generation number loaded. Create new load_commit_graph_info() method to fill in the information for a commit that exists only in the commit-graph file. Call it from parse_commit_buffer() after loading the other commit information from the given buffer. Only fill this information when specified by the 'check_graph' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:13 +02:00			`}`

commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00			`static struct tree load_tree_for_commit(struct commit_graph g, struct commit *c)`
			`{`
			`struct object_id oid;`
			`const unsigned char *commit_data = g->chunk_commit_data +`
			`GRAPH_DATA_WIDTH * (c->graph_pos);`

			`hashcpy(oid.hash, commit_data);`
tree: add repository argument to lookup_tree Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:56 +02:00			`c->maybe_tree = lookup_tree(the_repository, &oid);`
commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00
			`return c->maybe_tree;`
			`}`

commit-graph: load a root tree from specific graph When lazy-loading a tree for a commit, it will be important to select the tree from a specific struct commit_graph. Create a new method that specifies the commit-graph file and use that in get_commit_tree_in_graph(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:31 +02:00			`static struct tree get_commit_tree_in_graph_one(struct commit_graph g,`
			`const struct commit *c)`
commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00			`{`
			`if (c->maybe_tree)`
			`return c->maybe_tree;`
			`if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)`
commit-graph: load a root tree from specific graph When lazy-loading a tree for a commit, it will be important to select the tree from a specific struct commit_graph. Create a new method that specifies the commit-graph file and use that in get_commit_tree_in_graph(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:31 +02:00			`BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");`

			`return load_tree_for_commit(g, (struct commit *)c);`
			`}`
commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`struct tree get_commit_tree_in_graph(struct repository r, const struct commit *c)`
commit-graph: load a root tree from specific graph When lazy-loading a tree for a commit, it will be important to select the tree from a specific struct commit_graph. Create a new method that specifies the commit-graph file and use that in get_commit_tree_in_graph(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:31 +02:00			`{`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`return get_commit_tree_in_graph_one(r->objects->commit_graph, c);`
commit-graph: lazy-load trees for commits The commit-graph file provides quick access to commit data, including the OID of the root tree for each commit in the graph. When performing a deep commit-graph walk, we may not need to load most of the trees for these commits. Delay loading the tree object for a commit loaded from the graph until requested via get_commit_tree(). Do not lazy-load trees for commits not in the graph, since that requires duplicate parsing and the relative peformance improvement when trees are not needed is small. On the Linux repository, performance tests were run for the following command: git log --graph --oneline -1000 Before: 0.92s After: 0.66s Rel %: -28.3% Adding '-- kernel/' to the command requires loading the root tree for every commit that is walked. There was no measureable performance change as a result of this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:46 +02:00			`}`

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`static void write_graph_chunk_fanout(struct hashfile *f,`
			`struct commit **commits,`
			`int nr_commits)`
			`{`
			`int i, count = 0;`
			`struct commit **list = commits;`

			`/*`
			`* Write the first-level table (the list is sorted,`
			`* but we use a 256-entry lookup to be able to avoid`
			`* having to do eight extra binary search iterations).`
			`*/`
			`for (i = 0; i < 256; i++) {`
			`while (count < nr_commits) {`
			`if ((*list)->object.oid.hash[0] != i)`
			`break;`
			`count++;`
			`list++;`
			`}`

			`hashwrite_be32(f, count);`
			`}`
			`}`

			`static void write_graph_chunk_oids(struct hashfile *f, int hash_len,`
			`struct commit **commits, int nr_commits)`
			`{`
			`struct commit **list = commits;`
			`int count;`
			`for (count = 0; count < nr_commits; count++, list++)`
			`hashwrite(f, (*list)->object.oid.hash, (int)hash_len);`
			`}`

			`static const unsigned char commit_to_sha1(size_t index, void table)`
			`{`
			`struct commit **commits = table;`
			`return commits[index]->object.oid.hash;`
			`}`

			`static void write_graph_chunk_data(struct hashfile *f, int hash_len,`
			`struct commit **commits, int nr_commits)`
			`{`
			`struct commit **list = commits;`
			`struct commit **last = commits + nr_commits;`
			`uint32_t num_extra_edges = 0;`

			`while (list < last) {`
			`struct commit_list *parent;`
			`int edge_value;`
			`uint32_t packedDate[2];`

			`parse_commit(*list);`
treewide: replace maybe_tree with accessor methods In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-06 21:09:38 +02:00			`hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
			`parent = (*list)->parents;`

			`if (!parent)`
			`edge_value = GRAPH_PARENT_NONE;`
			`else {`
			`edge_value = sha1_pos(parent->item->object.oid.hash,`
			`commits,`
			`nr_commits,`
			`commit_to_sha1);`

			`if (edge_value < 0)`
			`edge_value = GRAPH_PARENT_MISSING;`
			`}`

			`hashwrite_be32(f, edge_value);`

			`if (parent)`
			`parent = parent->next;`

			`if (!parent)`
			`edge_value = GRAPH_PARENT_NONE;`
			`else if (parent->next)`
			`edge_value = GRAPH_OCTOPUS_EDGES_NEEDED \| num_extra_edges;`
			`else {`
			`edge_value = sha1_pos(parent->item->object.oid.hash,`
			`commits,`
			`nr_commits,`
			`commit_to_sha1);`
			`if (edge_value < 0)`
			`edge_value = GRAPH_PARENT_MISSING;`
			`}`

			`hashwrite_be32(f, edge_value);`

			`if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {`
			`do {`
			`num_extra_edges++;`
			`parent = parent->next;`
			`} while (parent);`
			`}`

			`if (sizeof((*list)->date) > 4)`
			`packedDate[0] = htonl(((*list)->date >> 32) & 0x3);`
			`else`
			`packedDate[0] = 0;`

commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00			`packedDate[0] \|= htonl((*list)->generation << 2);`

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`packedDate[1] = htonl((*list)->date);`
			`hashwrite(f, packedDate, 8);`

			`list++;`
			`}`
			`}`

			`static void write_graph_chunk_large_edges(struct hashfile *f,`
			`struct commit **commits,`
			`int nr_commits)`
			`{`
			`struct commit **list = commits;`
			`struct commit **last = commits + nr_commits;`
			`struct commit_list *parent;`

			`while (list < last) {`
			`int num_parents = 0;`
			`for (parent = (*list)->parents; num_parents < 3 && parent;`
			`parent = parent->next)`
			`num_parents++;`

			`if (num_parents <= 2) {`
			`list++;`
			`continue;`
			`}`

			`/* Since num_parents > 2, this initializer is safe. */`
			`for (parent = (*list)->parents->next; parent; parent = parent->next) {`
			`int edge_value = sha1_pos(parent->item->object.oid.hash,`
			`commits,`
			`nr_commits,`
			`commit_to_sha1);`

			`if (edge_value < 0)`
			`edge_value = GRAPH_PARENT_MISSING;`
			`else if (!parent->next)`
			`edge_value \|= GRAPH_LAST_EDGE;`

			`hashwrite_be32(f, edge_value);`
			`}`

			`list++;`
			`}`
			`}`

			`static int commit_compare(const void _a, const void _b)`
			`{`
			`const struct object_id a = (const struct object_id )_a;`
			`const struct object_id b = (const struct object_id )_b;`
			`return oidcmp(a, b);`
			`}`

			`struct packed_commit_list {`
			`struct commit **list;`
			`int nr;`
			`int alloc;`
			`};`

			`struct packed_oid_list {`
			`struct object_id *list;`
			`int nr;`
			`int alloc;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`struct progress *progress;`
			`int progress_done;`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`};`

			`static int add_packed_commits(const struct object_id *oid,`
			`struct packed_git *pack,`
			`uint32_t pos,`
			`void *data)`
			`{`
			`struct packed_oid_list list = (struct packed_oid_list)data;`
			`enum object_type type;`
			`off_t offset = nth_packed_object_offset(pack, pos);`
			`struct object_info oi = OBJECT_INFO_INIT;`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (list->progress)`
			`display_progress(list->progress, ++list->progress_done);`

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`oi.typep = &type;`
Merge branch 'sb/oid-object-info' The codepath around object-info API has been taught to take the repository object (which in turn tells the API which object store the objects are to be located). * sb/oid-object-info: cache.h: allow oid_object_info to handle arbitrary repositories packfile: add repository argument to cache_or_unpack_entry packfile: add repository argument to unpack_entry packfile: add repository argument to read_object packfile: add repository argument to packed_object_info packfile: add repository argument to packed_to_object_type packfile: add repository argument to retry_bad_packed_offset cache.h: add repository argument to oid_object_info cache.h: add repository argument to oid_object_info_extended 2018-05-23 07:38:16 +02:00			`if (packed_object_info(the_repository, pack, offset, &oi) < 0)`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`die(_("unable to get type of object %s"), oid_to_hex(oid));`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
			`if (type != OBJ_COMMIT)`
			`return 0;`

			`ALLOC_GROW(list->list, list->nr + 1, list->alloc);`
			`oidcpy(&(list->list[list->nr]), oid);`
			`list->nr++;`

			`return 0;`
			`}`

commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00			`static void add_missing_parents(struct packed_oid_list oids, struct commit commit)`
			`{`
			`struct commit_list *parent;`
			`for (parent = commit->parents; parent; parent = parent->next) {`
			`if (!(parent->item->object.flags & UNINTERESTING)) {`
			`ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);`
			`oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));`
			`oids->nr++;`
			`parent->item->object.flags \|= UNINTERESTING;`
			`}`
			`}`
			`}`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`static void close_reachable(struct packed_oid_list *oids, int report_progress)`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00			`{`
			`int i;`
			`struct commit *commit;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`struct progress *progress = NULL;`
			`int j = 0;`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (report_progress)`
			`progress = start_delayed_progress(`
			`_("Annotating commits in commit graph"), 0);`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00			`for (i = 0; i < oids->nr; i++) {`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`display_progress(progress, ++j);`
commit: add repository argument to lookup_commit Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:59 +02:00			`commit = lookup_commit(the_repository, &oids->list[i]);`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00			`if (commit)`
			`commit->object.flags \|= UNINTERESTING;`
			`}`

			`/*`
			`* As this loop runs, oids->nr may grow, but not more`
			`* than the number of missing commits in the reachable`
			`* closure.`
			`*/`
			`for (i = 0; i < oids->nr; i++) {`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`display_progress(progress, ++j);`
commit: add repository argument to lookup_commit Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:59 +02:00			`commit = lookup_commit(the_repository, &oids->list[i]);`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00
			`if (commit && !parse_commit(commit))`
			`add_missing_parents(oids, commit);`
			`}`

			`for (i = 0; i < oids->nr; i++) {`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`display_progress(progress, ++j);`
commit: add repository argument to lookup_commit Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:59 +02:00			`commit = lookup_commit(the_repository, &oids->list[i]);`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00
			`if (commit)`
			`commit->object.flags &= ~UNINTERESTING;`
			`}`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`stop_progress(&progress);`
commit-graph: close under reachability Teach write_commit_graph() to walk all parents from the commits discovered in packfiles. This prevents gaps given by loose objects or previously-missed packfiles. Also automatically add commits from the existing graph file, if it exists. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:04 +02:00			`}`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`static void compute_generation_numbers(struct packed_commit_list* commits,`
			`int report_progress)`
commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00			`{`
			`int i;`
			`struct commit_list *list = NULL;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`struct progress *progress = NULL;`
commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (report_progress)`
			`progress = start_progress(`
			`_("Computing commit graph generation numbers"),`
			`commits->nr);`
commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00			`for (i = 0; i < commits->nr; i++) {`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`display_progress(progress, i + 1);`
commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00			`if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&`
			`commits->list[i]->generation != GENERATION_NUMBER_ZERO)`
			`continue;`

			`commit_list_insert(commits->list[i], &list);`
			`while (list) {`
			`struct commit *current = list->item;`
			`struct commit_list *parent;`
			`int all_parents_computed = 1;`
			`uint32_t max_generation = 0;`

			`for (parent = current->parents; parent; parent = parent->next) {`
			`if (parent->item->generation == GENERATION_NUMBER_INFINITY \|\|`
			`parent->item->generation == GENERATION_NUMBER_ZERO) {`
			`all_parents_computed = 0;`
			`commit_list_insert(parent->item, &list);`
			`break;`
			`} else if (parent->item->generation > max_generation) {`
			`max_generation = parent->item->generation;`
			`}`
			`}`

			`if (all_parents_computed) {`
			`current->generation = max_generation + 1;`
			`pop_commit(&list);`

			`if (current->generation > GENERATION_NUMBER_MAX)`
			`current->generation = GENERATION_NUMBER_MAX;`
			`}`
			`}`
			`}`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`stop_progress(&progress);`
commit-graph: compute generation numbers While preparing commits to be written into a commit-graph file, compute the generation numbers using a depth-first strategy. The only commits that are walked in this depth-first search are those without a precomputed generation number. Thus, computation time will be relative to the number of new commits to the commit-graph file. If a computed generation number would exceed GENERATION_NUMBER_MAX, then use GENERATION_NUMBER_MAX instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-01 14:47:09 +02:00			`}`

commit-graph: add '--reachable' option When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:45 +02:00			`static int add_ref_to_list(const char *refname,`
			`const struct object_id *oid,`
			`int flags, void *cb_data)`
			`{`
			`struct string_list list = (struct string_list )cb_data;`

			`string_list_append(list, oid_to_hex(oid));`
			`return 0;`
			`}`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`void write_commit_graph_reachable(const char *obj_dir, int append,`
			`int report_progress)`
commit-graph: add '--reachable' option When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:45 +02:00			`{`
commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00			`struct string_list list = STRING_LIST_INIT_DUP;`
commit-graph: add '--reachable' option When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:45 +02:00
			`for_each_ref(add_ref_to_list, &list);`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`write_commit_graph(obj_dir, NULL, &list, append, report_progress);`
commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00
			`string_list_clear(&list, 0);`
commit-graph: add '--reachable' option When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:45 +02:00			`}`

commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`void write_commit_graph(const char *obj_dir,`
commit-graph: use string-list API for input Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:44 +02:00			`struct string_list *pack_indexes,`
			`struct string_list *commit_hex,`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`int append, int report_progress)`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`{`
			`struct packed_oid_list oids;`
			`struct packed_commit_list commits;`
			`struct hashfile *f;`
			`uint32_t i, count_distinct = 0;`
			`char *graph_name;`
			`struct lock_file lk = LOCK_INIT;`
			`uint32_t chunk_ids[5];`
			`uint64_t chunk_offsets[5];`
			`int num_chunks;`
			`int num_extra_edges;`
			`struct commit_list *parent;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`struct progress *progress = NULL;`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
commit-graph: not compatible with replace objects Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to write a new commit-graph. However, we do ensure that we write a new commit-graph by setting read_replace_refs to 0, thereby ignoring the replace refs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:27 +02:00			`if (!commit_graph_compatible(the_repository))`
			`return;`

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`oids.nr = 0;`
commit-graph: reduce initial oid allocation While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for many repos, especially large ones. Since we grow the repo dynamically, reduce this count by a factor of eight. We still set it to a minimum of 1024 before allocating. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:19 +02:00			`oids.alloc = approximate_object_count() / 32;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`oids.progress = NULL;`
			`oids.progress_done = 0;`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
commit-graph: implement "--append" option Teach git-commit-graph to add all commits from the existing commit-graph file to the file about to be written. This should be used when adding new commits without performing garbage collection. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:08 +02:00			`if (append) {`
commit-graph: add repo arg to graph readers Add a struct repository argument to the functions in commit-graph.h that read the commit graph. (This commit does not affect functions that write commit graphs.) Because the commit graph functions can now read the commit graph of any repository, the global variable core_commit_graph has been removed. Instead, the config option core.commitGraph is now read on the first time in a repository that a commit is attempted to be parsed using its commit graph. This commit includes a test that exercises the functionality on an arbitrary repository that is not the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:42 +02:00			`prepare_commit_graph_one(the_repository, obj_dir);`
commit-graph: store graph in struct object_store Instead of storing commit graphs in static variables, store them in struct object_store. There are no changes to the signatures of existing functions - they all still only support the_repository, and support for other instances of struct repository will be added in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:41 +02:00			`if (the_repository->objects->commit_graph)`
			`oids.alloc += the_repository->objects->commit_graph->num_commits;`
commit-graph: implement "--append" option Teach git-commit-graph to add all commits from the existing commit-graph file to the file about to be written. This should be used when adding new commits without performing garbage collection. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:08 +02:00			`}`

commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`if (oids.alloc < 1024)`
			`oids.alloc = 1024;`
			`ALLOC_ARRAY(oids.list, oids.alloc);`

commit-graph: store graph in struct object_store Instead of storing commit graphs in static variables, store them in struct object_store. There are no changes to the signatures of existing functions - they all still only support the_repository, and support for other instances of struct repository will be added in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:41 +02:00			`if (append && the_repository->objects->commit_graph) {`
			`struct commit_graph *commit_graph =`
			`the_repository->objects->commit_graph;`
commit-graph: implement "--append" option Teach git-commit-graph to add all commits from the existing commit-graph file to the file about to be written. This should be used when adding new commits without performing garbage collection. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:08 +02:00			`for (i = 0; i < commit_graph->num_commits; i++) {`
			`const unsigned char *hash = commit_graph->chunk_oid_lookup +`
			`commit_graph->hash_len * i;`
			`hashcpy(oids.list[oids.nr++].hash, hash);`
			`}`
			`}`

commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`if (pack_indexes) {`
			`struct strbuf packname = STRBUF_INIT;`
			`int dirlen;`
			`strbuf_addf(&packname, "%s/pack/", obj_dir);`
			`dirlen = packname.len;`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (report_progress) {`
			`oids.progress = start_delayed_progress(`
			`_("Finding commits for commit graph"), 0);`
			`oids.progress_done = 0;`
			`}`
commit-graph: use string-list API for input Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:44 +02:00			`for (i = 0; i < pack_indexes->nr; i++) {`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`struct packed_git *p;`
			`strbuf_setlen(&packname, dirlen);`
commit-graph: use string-list API for input Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:44 +02:00			`strbuf_addstr(&packname, pack_indexes->items[i].string);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`p = add_packed_git(packname.buf, packname.len, 1);`
			`if (!p)`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`die(_("error adding pack %s"), packname.buf);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`if (open_pack_index(p))`
commit-graph.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-21 09:49:26 +02:00			`die(_("error opening index for %s"), packname.buf);`
for_each_packed_object: support iterating in pack-order We currently iterate over objects within a pack in .idx order, which uses the object hashes. That means that it is effectively random with respect to the location of the object within the pack. If you're going to access the actual object data, there are two reasons to move linearly through the pack itself: 1. It improves the locality of access in the packfile. In the cold-cache case, this may mean fewer disk seeks, or better usage of disk cache. 2. We store related deltas together in the packfile. Which means that the delta base cache can operate much more efficiently if we visit all of those related deltas in sequence, as the earlier items are likely to still be in the cache. Whereas if we visit the objects in random order, our cache entries are much more likely to have been evicted by unrelated deltas in the meantime. So in general, if you're going to access the object contents pack order is generally going to end up more efficient. But if you're simply generating a list of object names, or if you're going to end up sorting the result anyway, you're better off just using the .idx order, as finding the pack order means generating the in-memory pack-revindex. According to the numbers in 8b8dfd5132 (pack-revindex: radix-sort the revindex, 2013-07-11), that takes about 200ms for linux.git, and 20ms for git.git (those numbers are a few years old but are still a good ballpark). That makes it a good optimization for some cases (we can save tens of seconds in git.git by having good locality of delta access, for a 20ms cost), but a bad one for others (e.g., right now "cat-file --batch-all-objects --batch-check="%(objectname)" is 170ms in git.git, so adding 20ms to that is noticeable). Hence this patch makes it an optional flag. You can't actually do any interesting timings yet, as it's not plumbed through to any user-facing tools like cat-file. That will come in a later patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-11 01:15:49 +02:00			`for_each_object_in_pack(p, add_packed_commits, &oids, 0);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`close_pack(p);`
commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00			`free(p);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`}`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`stop_progress(&oids.progress);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`strbuf_release(&packname);`
commit-graph: build graph from starting commits Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:07 +02:00			`}`

			`if (commit_hex) {`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (report_progress)`
			`progress = start_delayed_progress(`
			`_("Finding commits for commit graph"),`
			`commit_hex->nr);`
commit-graph: use string-list API for input Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:44 +02:00			`for (i = 0; i < commit_hex->nr; i++) {`
commit-graph: build graph from starting commits Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:07 +02:00			`const char *end;`
			`struct object_id oid;`
			`struct commit *result;`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`display_progress(progress, i + 1);`
commit-graph: use string-list API for input Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:44 +02:00			`if (commit_hex->items[i].string &&`
			`parse_oid_hex(commit_hex->items[i].string, &oid, &end))`
commit-graph: build graph from starting commits Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:07 +02:00			`continue;`

commit: add repository argument to lookup_commit_reference_gently Add a repository argument to allow callers of lookup_commit_reference_gently to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:57 +02:00			`result = lookup_commit_reference_gently(the_repository, &oid, 1);`
commit-graph: build graph from starting commits Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:07 +02:00
			`if (result) {`
			`ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);`
			`oidcpy(&oids.list[oids.nr], &(result->object.oid));`
			`oids.nr++;`
			`}`
			`}`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`stop_progress(&progress);`
commit-graph: build graph from starting commits Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:07 +02:00			`}`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`if (!pack_indexes && !commit_hex) {`
			`if (report_progress)`
			`oids.progress = start_delayed_progress(`
			`_("Finding commits for commit graph"), 0);`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00			`for_each_packed_object(add_packed_commits, &oids, 0);`
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`stop_progress(&oids.progress);`
			`}`
commit-graph: read only from specific pack-indexes Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-10 14:56:06 +02:00
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`close_reachable(&oids, report_progress);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
			`QSORT(oids.list, oids.nr, commit_compare);`

			`count_distinct = 1;`
			`for (i = 1; i < oids.nr; i++) {`
convert "oidcmp() != 0" to "!oideq()" This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-28 23:22:48 +02:00			`if (!oideq(&oids.list[i - 1], &oids.list[i]))`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`count_distinct++;`
			`}`

			`if (count_distinct >= GRAPH_PARENT_MISSING)`
			`die(_("the commit graph format cannot write %d commits"), count_distinct);`

			`commits.nr = 0;`
			`commits.alloc = count_distinct;`
			`ALLOC_ARRAY(commits.list, commits.alloc);`

			`num_extra_edges = 0;`
			`for (i = 0; i < oids.nr; i++) {`
			`int num_parents = 0;`
convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-28 23:22:40 +02:00			`if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`continue;`

commit: add repository argument to lookup_commit Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-29 03:21:59 +02:00			`commits.list[commits.nr] = lookup_commit(the_repository, &oids.list[i]);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`parse_commit(commits.list[commits.nr]);`

			`for (parent = commits.list[commits.nr]->parents;`
			`parent; parent = parent->next)`
			`num_parents++;`

			`if (num_parents > 2)`
			`num_extra_edges += num_parents - 1;`

			`commits.nr++;`
			`}`
			`num_chunks = num_extra_edges ? 4 : 3;`

			`if (commits.nr >= GRAPH_PARENT_MISSING)`
			`die(_("too many commits to write graph"));`

commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all \| git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx \| git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:35 +02:00			`compute_generation_numbers(&commits, report_progress);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
			`graph_name = get_commit_graph_filename(obj_dir);`
commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00			`if (safe_create_leading_directories(graph_name)) {`
			`UNLEAK(graph_name);`
commit-graph: fix UX issue when .lock file exists We use the lockfile API to avoid multiple Git processes from writing to the commit-graph file in the .git/objects/info directory. In some cases, this directory may not exist, so we check for its existence. The existing code does the following when acquiring the lock: 1. Try to acquire the lock. 2. If it fails, try to create the .git/object/info directory. 3. Try to acquire the lock, failing if necessary. The problem is that if the lockfile exists, then the mkdir fails, giving an error that doesn't help the user: "fatal: cannot mkdir .git/objects/info: File exists" While technically this honors the lockfile, it does not help the user. Instead, do the following: 1. Check for existence of .git/objects/info; create if necessary. 2. Try to acquire the lock, failing if necessary. The new output looks like: fatal: Unable to create '<dir>/.git/objects/info/commit-graph.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 19:42:52 +02:00			`die_errno(_("unable to create leading directories of %s"),`
			`graph_name);`
commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00			`}`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00
commit-graph: fix UX issue when .lock file exists We use the lockfile API to avoid multiple Git processes from writing to the commit-graph file in the .git/objects/info directory. In some cases, this directory may not exist, so we check for its existence. The existing code does the following when acquiring the lock: 1. Try to acquire the lock. 2. If it fails, try to create the .git/object/info directory. 3. Try to acquire the lock, failing if necessary. The problem is that if the lockfile exists, then the mkdir fails, giving an error that doesn't help the user: "fatal: cannot mkdir .git/objects/info: File exists" While technically this honors the lockfile, it does not help the user. Instead, do the following: 1. Check for existence of .git/objects/info; create if necessary. 2. Try to acquire the lock, failing if necessary. The new output looks like: fatal: Unable to create '<dir>/.git/objects/info/commit-graph.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 19:42:52 +02:00			`hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);`

			`hashwrite_be32(f, GRAPH_SIGNATURE);`

			`hashwrite_u8(f, GRAPH_VERSION);`
			`hashwrite_u8(f, GRAPH_OID_VERSION);`
			`hashwrite_u8(f, num_chunks);`
			`hashwrite_u8(f, 0); /* unused padding byte */`

			`chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;`
			`chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;`
			`chunk_ids[2] = GRAPH_CHUNKID_DATA;`
			`if (num_extra_edges)`
			`chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;`
			`else`
			`chunk_ids[3] = 0;`
			`chunk_ids[4] = 0;`

			`chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;`
			`chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;`
			`chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;`
			`chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;`
			`chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;`

			`for (i = 0; i <= num_chunks; i++) {`
			`uint32_t chunk_write[3];`

			`chunk_write[0] = htonl(chunk_ids[i]);`
			`chunk_write[1] = htonl(chunk_offsets[i] >> 32);`
			`chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);`
			`hashwrite(f, chunk_write, 12);`
			`}`

			`write_graph_chunk_fanout(f, commits.list, commits.nr);`
			`write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);`
			`write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);`
			`write_graph_chunk_large_edges(f, commits.list, commits.nr);`

commit-graph: close_commit_graph before shallow walk Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in t5500-fetch-pack.sh runs git fetch --shallow-exclude one origin with a file-based transfer. When the "remote" has a commit-graph, we do not prevent the commit-graph from being loaded, but then the commits are intended to be dynamically transferred into shallow commits during get_shallow_commits_by_rev_list(). By closing the commit-graph before this call, we prevent this interaction. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-20 20:24:34 +02:00			`close_commit_graph(the_repository);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM \| CSUM_FSYNC);`
			`commit_lock_file(&lk);`

commit-graph: clean up leaked memory during write The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-10-03 19:12:15 +02:00			`free(graph_name);`
			`free(commits.list);`
commit-graph: implement write_commit_graph() Teach Git to write a commit graph file by checking all packed objects to see if they are commits, then store the file in the given object directory. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-04-02 22:34:19 +02:00			`free(oids.list);`
			`}`
commit-graph: add 'verify' subcommand If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:32 +02:00
commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2`
commit-graph: add 'verify' subcommand If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:32 +02:00			`static int verify_commit_graph_error;`

			`static void graph_report(const char *fmt, ...)`
			`{`
			`va_list ap;`

			`verify_commit_graph_error = 1;`
			`va_start(ap, fmt);`
			`vfprintf(stderr, fmt, ap);`
			`fprintf(stderr, "\n");`
			`va_end(ap);`
			`}`

commit-graph: verify generation number While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:39 +02:00			`#define GENERATION_ZERO_EXISTS 1`
			`#define GENERATION_NUMBER_EXISTS 2`

commit-graph: add 'verify' subcommand If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:32 +02:00			`int verify_commit_graph(struct repository r, struct commit_graph g)`
			`{`
commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00			`uint32_t i, cur_fanout_pos = 0;`
commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`struct object_id prev_oid, cur_oid, checksum;`
commit-graph: verify generation number While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:39 +02:00			`int generation_zero = 0;`
commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`struct hashfile *f;`
			`int devnull;`
commit-graph verify: add progress output For the reasons explained in the "commit-graph write: add progress output" commit leading up to this one, emit progress on "commit-graph verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27) "git fsck" has called this command if core.commitGraph=true, but there's been no progress output to indicate that anything was different. Now there is (on my tiny dotfiles.git repository): $ git -c core.commitGraph=true -C ~/ fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (2821/2821), done. dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b Verifying commits in commit graph: 100% (867/867), done. And on a larger repository, such as the 2015-04-03-1M-git.git test repository: $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify Verifying commits in commit graph: 100% (1000447/1000447), done. real 0m7.813s [...] Since the "commit-graph verify" subcommand is never called from "git gc", we don't have to worry about passing some some "report_progress" progress variable around for this codepath. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:36 +02:00			`struct progress *progress = NULL;`
commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00
commit-graph: add 'verify' subcommand If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:32 +02:00			`if (!g) {`
			`graph_report("no commit-graph file loaded");`
			`return 1;`
			`}`

commit-graph: verify required chunks are present The commit-graph file requires the following three chunks: * OID Fanout * OID Lookup * Commit Data If any of these are missing, then the 'verify' subcommand should report a failure. This includes the chunk IDs malformed or the chunk count is truncated. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:34 +02:00			`verify_commit_graph_error = 0;`

			`if (!g->chunk_oid_fanout)`
			`graph_report("commit-graph is missing the OID Fanout chunk");`
			`if (!g->chunk_oid_lookup)`
			`graph_report("commit-graph is missing the OID Lookup chunk");`
			`if (!g->chunk_commit_data)`
			`graph_report("commit-graph is missing the Commit Data chunk");`

commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00			`if (verify_commit_graph_error)`
			`return verify_commit_graph_error;`

commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`devnull = open("/dev/null", O_WRONLY);`
			`f = hashfd(devnull, NULL);`
			`hashwrite(f, g->data, g->data_len - g->hash_len);`
			`finalize_hashfile(f, checksum.hash, CSUM_CLOSE);`
convert "hashcmp() != 0" to "!hasheq()" This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-28 23:22:52 +02:00			`if (!hasheq(checksum.hash, g->data + g->data_len - g->hash_len)) {`
commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));`
			`verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;`
			`}`

commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00			`for (i = 0; i < g->num_commits; i++) {`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00			`struct commit *graph_commit;`

commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00			`hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);`

			`if (i && oidcmp(&prev_oid, &cur_oid) >= 0)`
			`graph_report("commit-graph has incorrect OID order: %s then %s",`
			`oid_to_hex(&prev_oid),`
			`oid_to_hex(&cur_oid));`

			`oidcpy(&prev_oid, &cur_oid);`

			`while (cur_oid.hash[0] > cur_fanout_pos) {`
			`uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);`

			`if (i != fanout_value)`
			`graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",`
			`cur_fanout_pos, fanout_value, i);`
			`cur_fanout_pos++;`
			`}`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00
Merge branch 'ds/commit-graph-fsck' into jt/commit-graph-per-object-store * ds/commit-graph-fsck: (23 commits) coccinelle: update commit.cocci commit-graph: update design document gc: automatically write commit-graph files commit-graph: add '--reachable' option commit-graph: use string-list API for input fsck: verify commit-graph commit-graph: verify contents match checksum commit-graph: test for corrupted octopus edge commit-graph: verify commit date commit-graph: verify generation number commit-graph: verify parent list commit-graph: verify root tree OIDs commit-graph: verify objects exist commit-graph: verify corrupt OID fanout and lookup commit-graph: verify required chunks are present commit-graph: verify catches corrupt signature commit-graph: add 'verify' subcommand commit-graph: load a root tree from specific graph commit: force commit to parse from object database commit-graph: parse commit from chosen graph ... 2018-07-18 00:46:19 +02:00			`graph_commit = lookup_commit(r, &cur_oid);`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00			`if (!parse_commit_in_graph_one(g, graph_commit))`
			`graph_report("failed to parse %s from commit-graph",`
			`oid_to_hex(&cur_oid));`
commit-graph: verify corrupt OID fanout and lookup In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:35 +02:00			`}`

			`while (cur_fanout_pos < 256) {`
			`uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);`

			`if (g->num_commits != fanout_value)`
			`graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",`
			`cur_fanout_pos, fanout_value, i);`

			`cur_fanout_pos++;`
			`}`

commit-graph: verify contents match checksum The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:42 +02:00			`if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`return verify_commit_graph_error;`

commit-graph verify: add progress output For the reasons explained in the "commit-graph write: add progress output" commit leading up to this one, emit progress on "commit-graph verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27) "git fsck" has called this command if core.commitGraph=true, but there's been no progress output to indicate that anything was different. Now there is (on my tiny dotfiles.git repository): $ git -c core.commitGraph=true -C ~/ fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (2821/2821), done. dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b Verifying commits in commit graph: 100% (867/867), done. And on a larger repository, such as the 2015-04-03-1M-git.git test repository: $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify Verifying commits in commit graph: 100% (1000447/1000447), done. real 0m7.813s [...] Since the "commit-graph verify" subcommand is never called from "git gc", we don't have to worry about passing some some "report_progress" progress variable around for this codepath. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:36 +02:00			`progress = start_progress(_("Verifying commits in commit graph"),`
			`g->num_commits);`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`for (i = 0; i < g->num_commits; i++) {`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00			`struct commit graph_commit, odb_commit;`
commit-graph: verify parent list The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0xFFFFFFFF to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:38 +02:00			`struct commit_list graph_parents, odb_parents;`
commit-graph: verify generation number While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:39 +02:00			`uint32_t max_generation = 0;`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00
commit-graph verify: add progress output For the reasons explained in the "commit-graph write: add progress output" commit leading up to this one, emit progress on "commit-graph verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27) "git fsck" has called this command if core.commitGraph=true, but there's been no progress output to indicate that anything was different. Now there is (on my tiny dotfiles.git repository): $ git -c core.commitGraph=true -C ~/ fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (2821/2821), done. dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b Verifying commits in commit graph: 100% (867/867), done. And on a larger repository, such as the 2015-04-03-1M-git.git test repository: $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify Verifying commits in commit graph: 100% (1000447/1000447), done. real 0m7.813s [...] Since the "commit-graph verify" subcommand is never called from "git gc", we don't have to worry about passing some some "report_progress" progress variable around for this codepath. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:36 +02:00			`display_progress(progress, i + 1);`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);`

Merge branch 'ds/commit-graph-fsck' into jt/commit-graph-per-object-store * ds/commit-graph-fsck: (23 commits) coccinelle: update commit.cocci commit-graph: update design document gc: automatically write commit-graph files commit-graph: add '--reachable' option commit-graph: use string-list API for input fsck: verify commit-graph commit-graph: verify contents match checksum commit-graph: test for corrupted octopus edge commit-graph: verify commit date commit-graph: verify generation number commit-graph: verify parent list commit-graph: verify root tree OIDs commit-graph: verify objects exist commit-graph: verify corrupt OID fanout and lookup commit-graph: verify required chunks are present commit-graph: verify catches corrupt signature commit-graph: add 'verify' subcommand commit-graph: load a root tree from specific graph commit: force commit to parse from object database commit-graph: parse commit from chosen graph ... 2018-07-18 00:46:19 +02:00			`graph_commit = lookup_commit(r, &cur_oid);`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));`
			`if (parse_commit_internal(odb_commit, 0, 0)) {`
			`graph_report("failed to parse %s from object database",`
			`oid_to_hex(&cur_oid));`
			`continue;`
			`}`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00
convert "oidcmp() != 0" to "!oideq()" This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-28 23:22:48 +02:00			`if (!oideq(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,`
commit-graph: verify root tree OIDs The 'verify' subcommand must compare the commit content parsed from the commit-graph against the content in the object database. Use lookup_commit() and parse_commit_in_graph_one() to parse the commits from the graph and compare against a commit that is loaded separately and parsed directly from the object database. Add checks for the root tree OID. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:37 +02:00			`get_commit_tree_oid(odb_commit)))`
			`graph_report("root tree OID for commit %s in commit-graph is %s != %s",`
			`oid_to_hex(&cur_oid),`
			`oid_to_hex(get_commit_tree_oid(graph_commit)),`
			`oid_to_hex(get_commit_tree_oid(odb_commit)));`
commit-graph: verify parent list The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0xFFFFFFFF to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:38 +02:00
			`graph_parents = graph_commit->parents;`
			`odb_parents = odb_commit->parents;`

			`while (graph_parents) {`
			`if (odb_parents == NULL) {`
			`graph_report("commit-graph parent list for commit %s is too long",`
			`oid_to_hex(&cur_oid));`
			`break;`
			`}`

convert "oidcmp() != 0" to "!oideq()" This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-28 23:22:48 +02:00			`if (!oideq(&graph_parents->item->object.oid, &odb_parents->item->object.oid))`
commit-graph: verify parent list The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0xFFFFFFFF to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:38 +02:00			`graph_report("commit-graph parent for %s is %s != %s",`
			`oid_to_hex(&cur_oid),`
			`oid_to_hex(&graph_parents->item->object.oid),`
			`oid_to_hex(&odb_parents->item->object.oid));`

commit-graph: verify generation number While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:39 +02:00			`if (graph_parents->item->generation > max_generation)`
			`max_generation = graph_parents->item->generation;`

commit-graph: verify parent list The commit-graph file stores parents in a two-column portion of the commit data chunk. If there is only one parent, then the second column stores 0xFFFFFFFF to indicate no second parent. The 'verify' subcommand checks the parent list for the commit loaded from the commit-graph and the one parsed from the object database. Test these checks for corrupt parents, too many parents, and wrong parents. Add a boundary check to insert_parent_or_die() for when the parent position value is out of range. The octopus merge will be tested in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:38 +02:00			`graph_parents = graph_parents->next;`
			`odb_parents = odb_parents->next;`
			`}`

			`if (odb_parents != NULL)`
			`graph_report("commit-graph parent list for commit %s terminates early",`
			`oid_to_hex(&cur_oid));`
commit-graph: verify generation number While iterating through the commit parents, perform the generation number calculation and compare against the value stored in the commit-graph. The tests demonstrate that having a different set of parents affects the generation number calculation, and this value propagates to descendants. Hence, we drop the single-line condition on the output. Since Git will ship with the commit-graph feature without generation numbers, we need to accept commit-graphs with all generation numbers equal to zero. In this case, ignore the generation number calculation. However, verify that we should never have a mix of zero and non-zero generation numbers. Create a test that sets one commit to generation zero and all following commits report a failure as they have non-zero generation in a file that contains generation number zero. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:39 +02:00
			`if (!graph_commit->generation) {`
			`if (generation_zero == GENERATION_NUMBER_EXISTS)`
			`graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",`
			`oid_to_hex(&cur_oid));`
			`generation_zero = GENERATION_ZERO_EXISTS;`
			`} else if (generation_zero == GENERATION_ZERO_EXISTS)`
			`graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",`
			`oid_to_hex(&cur_oid));`

			`if (generation_zero == GENERATION_ZERO_EXISTS)`
			`continue;`

			`/*`
			`* If one of our parents has generation GENERATION_NUMBER_MAX, then`
			`* our generation is also GENERATION_NUMBER_MAX. Decrement to avoid`
			`* extra logic in the following condition.`
			`*/`
			`if (max_generation == GENERATION_NUMBER_MAX)`
			`max_generation--;`

			`if (graph_commit->generation != max_generation + 1)`
			`graph_report("commit-graph generation for commit %s is %u != %u",`
			`oid_to_hex(&cur_oid),`
			`graph_commit->generation,`
			`max_generation + 1);`
commit-graph: verify commit date Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:40 +02:00
			`if (graph_commit->date != odb_commit->date)`
			`graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,`
			`oid_to_hex(&cur_oid),`
			`graph_commit->date,`
			`odb_commit->date);`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00			`}`
commit-graph verify: add progress output For the reasons explained in the "commit-graph write: add progress output" commit leading up to this one, emit progress on "commit-graph verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27) "git fsck" has called this command if core.commitGraph=true, but there's been no progress output to indicate that anything was different. Now there is (on my tiny dotfiles.git repository): $ git -c core.commitGraph=true -C ~/ fsck Checking object directories: 100% (256/256), done. Checking objects: 100% (2821/2821), done. dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b Verifying commits in commit graph: 100% (867/867), done. And on a larger repository, such as the 2015-04-03-1M-git.git test repository: $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify Verifying commits in commit graph: 100% (1000447/1000447), done. real 0m7.813s [...] Since the "commit-graph verify" subcommand is never called from "git gc", we don't have to worry about passing some some "report_progress" progress variable around for this codepath. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-17 17:33:36 +02:00			`stop_progress(&progress);`
commit-graph: verify objects exist In the 'verify' subcommand, load commits directly from the object database to ensure they exist. Parse by skipping the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:36 +02:00
commit-graph: add 'verify' subcommand If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-06-27 15:24:32 +02:00			`return verify_commit_graph_error;`
			`}`
commit-graph: add free_commit_graph Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-07-12 00:42:40 +02:00
			`void free_commit_graph(struct commit_graph *g)`
			`{`
			`if (!g)`
			`return;`
			`if (g->graph_fd >= 0) {`
			`munmap((void *)g->data, g->data_len);`
			`g->data = NULL;`
			`close(g->graph_fd);`
			`}`
			`free(g);`
			`}`