mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-16 22:14:53 +01:00

354 lines

7.8 KiB

C

Raw Normal View History

Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`#include "cache.h"`
tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-03-30 08:55:43 +02:00			`#include "tree-walk.h"`
Improved three-way blob merging code This fleshes out the code that generates a three-way merge of a set of blobs. It still actually does the three-way merge using an external executable (ie just calling "merge"), but the interfaces have been cleaned up a lot and are now fully based on the 'mmfile_t' interface, so if libxdiff were to ever grow a compatible three-way-merge, it could probably be directly plugged in. It also uses the previous XDL_EMIT_COMMON functionality extension to libxdiff to generate a made-up base file for the merge for the case where no base file previously existed. This should be equivalent to what we currently do in git-merge-one-file.sh: diff -u -La/$orig -Lb/$orig $orig $src2 \| git-apply --no-add except it should be much simpler and can be done using the direct libxdiff interfaces. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 07:06:36 +02:00			`#include "xdiff-interface.h"`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`#include "blob.h"`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00
			`static const char merge_tree_usage[] = "git-merge-tree <base-tree> <branch1> <branch2>";`
			`static int resolve_directories = 1;`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`struct merge_list {`
			`struct merge_list *next;`
			`struct merge_list link; / other stages for this object */`

			`unsigned int stage : 2,`
			`flags : 30;`
			`unsigned int mode;`
			`const char *path;`
			`struct blob *blob;`
			`};`

			`static struct merge_list merge_result, *merge_result_end = &merge_result;`

			`static void add_merge_entry(struct merge_list *entry)`
			`{`
			`*merge_result_end = entry;`
			`merge_result_end = &entry->next;`
			`}`

Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`static void merge_trees(struct tree_desc t[3], const char *base);`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`static const char explanation(struct merge_list entry)`
			`{`
			`switch (entry->stage) {`
			`case 0:`
			`return "merged";`
			`case 3:`
			`return "added in remote";`
			`case 2:`
			`if (entry->link)`
			`return "added in both";`
			`return "added in local";`
			`}`

			`/* Existed in base */`
			`entry = entry->link;`
			`if (!entry)`
			`return "removed in both";`

			`if (entry->link)`
			`return "changed in both";`

			`if (entry->stage == 3)`
			`return "removed in local";`
			`return "removed in remote";`
			`}`

Improved three-way blob merging code This fleshes out the code that generates a three-way merge of a set of blobs. It still actually does the three-way merge using an external executable (ie just calling "merge"), but the interfaces have been cleaned up a lot and are now fully based on the 'mmfile_t' interface, so if libxdiff were to ever grow a compatible three-way-merge, it could probably be directly plugged in. It also uses the previous XDL_EMIT_COMMON functionality extension to libxdiff to generate a made-up base file for the merge for the case where no base file previously existed. This should be equivalent to what we currently do in git-merge-one-file.sh: diff -u -La/$orig -Lb/$orig $orig $src2 \| git-apply --no-add except it should be much simpler and can be done using the direct libxdiff interfaces. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 07:06:36 +02:00			`extern void merge_file(struct blob , struct blob , struct blob , unsigned long *);`

			`static void result(struct merge_list entry, unsigned long *size)`
			`{`
			`char type[20];`
			`struct blob base, our, *their;`

			`if (!entry->stage)`
			`return read_sha1_file(entry->blob->object.sha1, type, size);`
			`base = NULL;`
			`if (entry->stage == 1) {`
			`base = entry->blob;`
			`entry = entry->link;`
			`}`
			`our = NULL;`
			`if (entry && entry->stage == 2) {`
			`our = entry->blob;`
			`entry = entry->link;`
			`}`
			`their = NULL;`
			`if (entry)`
			`their = entry->blob;`
			`return merge_file(base, our, their, size);`
			`}`

			`static void origin(struct merge_list entry, unsigned long *size)`
			`{`
			`char type[20];`
			`while (entry) {`
			`if (entry->stage == 2)`
			`return read_sha1_file(entry->blob->object.sha1, type, size);`
			`entry = entry->link;`
			`}`
			`return NULL;`
			`}`

			`static int show_outf(void priv_, mmbuffer_t mb, int nbuf)`
			`{`
			`int i;`
			`for (i = 0; i < nbuf; i++)`
			`printf("%.*s", (int) mb[i].size, mb[i].ptr);`
			`return 0;`
			`}`

			`static void show_diff(struct merge_list *entry)`
			`{`
			`unsigned long size;`
			`mmfile_t src, dst;`
			`xpparam_t xpp;`
			`xdemitconf_t xecfg;`
			`xdemitcb_t ecb;`

			`xpp.flags = XDF_NEED_MINIMAL;`
			`xecfg.ctxlen = 3;`
			`xecfg.flags = 0;`
			`ecb.outf = show_outf;`
			`ecb.priv = NULL;`

			`src.ptr = origin(entry, &size);`
			`if (!src.ptr)`
			`size = 0;`
			`src.size = size;`
			`dst.ptr = result(entry, &size);`
			`if (!dst.ptr)`
			`size = 0;`
			`dst.size = size;`
			`xdl_diff(&src, &dst, &xpp, &xecfg, &ecb);`
			`free(src.ptr);`
			`free(dst.ptr);`
			`}`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`static void show_result_list(struct merge_list *entry)`
			`{`
			`printf("%s\n", explanation(entry));`
			`do {`
			`struct merge_list *link = entry->link;`
			`static const char *desc[4] = { "result", "base", "our", "their" };`
			`printf(" %-6s %o %s %s\n", desc[entry->stage], entry->mode, sha1_to_hex(entry->blob->object.sha1), entry->path);`
			`entry = link;`
			`} while (entry);`
			`}`

			`static void show_result(void)`
			`{`
			`struct merge_list *walk;`

			`walk = merge_result;`
			`while (walk) {`
			`show_result_list(walk);`
Improved three-way blob merging code This fleshes out the code that generates a three-way merge of a set of blobs. It still actually does the three-way merge using an external executable (ie just calling "merge"), but the interfaces have been cleaned up a lot and are now fully based on the 'mmfile_t' interface, so if libxdiff were to ever grow a compatible three-way-merge, it could probably be directly plugged in. It also uses the previous XDL_EMIT_COMMON functionality extension to libxdiff to generate a made-up base file for the merge for the case where no base file previously existed. This should be equivalent to what we currently do in git-merge-one-file.sh: diff -u -La/$orig -Lb/$orig $orig $src2 \| git-apply --no-add except it should be much simpler and can be done using the direct libxdiff interfaces. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-29 07:06:36 +02:00			`show_diff(walk);`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`walk = walk->next;`
			`}`
			`}`

Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`/* An empty entry never compares same, not even to another empty entry */`
			`static int same_entry(struct name_entry a, struct name_entry b)`
			`{`
			`return a->sha1 &&`
			`b->sha1 &&`
Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-08-17 20:54:57 +02:00			`!hashcmp(a->sha1, b->sha1) &&`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`a->mode == b->mode;`
			`}`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`static struct merge_list create_entry(unsigned stage, unsigned mode, const unsigned char sha1, const char *path)`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`{`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`struct merge_list res = xmalloc(sizeof(res));`

			`memset(res, 0, sizeof(*res));`
			`res->stage = stage;`
			`res->path = path;`
			`res->mode = mode;`
			`res->blob = lookup_blob(sha1);`
			`return res;`
Handling large files with GIT On Tue, 14 Feb 2006, Linus Torvalds wrote: > > Here, btw, is the trivial diff to turn my previous "tree-resolve" into a > "resolve tree relative to the current branch". Gaah. It was trivial, and it happened to work fine for my test-case, but when I started looking at not doing that extremely aggressive subdirectory merging, that showed a few other issues... So in case people want to try, here's a third patch. Oh, and it's against my _original_ path, not incremental to the middle one (ie both patches two and three are against patch #1, it's not a nice series). Now I'm really done, and won't be sending out any more patches today. Sorry for the noise. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:33:02 +01:00			`}`

			`static void resolve(const char base, struct name_entry branch1, struct name_entry *result)`
			`{`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`struct merge_list orig, final;`
			`const char *path;`

Handling large files with GIT On Tue, 14 Feb 2006, Linus Torvalds wrote: > > Here, btw, is the trivial diff to turn my previous "tree-resolve" into a > "resolve tree relative to the current branch". Gaah. It was trivial, and it happened to work fine for my test-case, but when I started looking at not doing that extremely aggressive subdirectory merging, that showed a few other issues... So in case people want to try, here's a third patch. Oh, and it's against my _original_ path, not incremental to the middle one (ie both patches two and three are against patch #1, it's not a nice series). Now I'm really done, and won't be sending out any more patches today. Sorry for the noise. Linus Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:33:02 +01:00			`/* If it's already branch1, don't bother showing it */`
			`if (!branch1)`
			`return;`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`path = strdup(mkpath("%s%s", base, result->path));`
			`orig = create_entry(2, branch1->mode, branch1->sha1, path);`
			`final = create_entry(0, result->mode, result->sha1, path);`

			`final->link = orig;`

			`add_merge_entry(final);`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`

			`static int unresolved_directory(const char *base, struct name_entry n[3])`
			`{`
			`int baselen;`
			`char *newbase;`
			`struct name_entry *p;`
			`struct tree_desc t[3];`
			`void buf0, buf1, *buf2;`

			`if (!resolve_directories)`
			`return 0;`
			`p = n;`
			`if (!p->mode) {`
			`p++;`
			`if (!p->mode)`
			`p++;`
			`}`
			`if (!S_ISDIR(p->mode))`
			`return 0;`
			`baselen = strlen(base);`
			`newbase = xmalloc(baselen + p->pathlen + 2);`
			`memcpy(newbase, base, baselen);`
			`memcpy(newbase + baselen, p->path, p->pathlen);`
			`memcpy(newbase + baselen + p->pathlen, "/", 2);`

			`buf0 = fill_tree_descriptor(t+0, n[0].sha1);`
			`buf1 = fill_tree_descriptor(t+1, n[1].sha1);`
			`buf2 = fill_tree_descriptor(t+2, n[2].sha1);`
			`merge_trees(t, newbase);`

			`free(buf0);`
			`free(buf1);`
			`free(buf2);`
			`free(newbase);`
			`return 1;`
			`}`

Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00
			`static struct merge_list link_entry(unsigned stage, const char base, struct name_entry n, struct merge_list entry)`
			`{`
			`const char *path;`
			`struct merge_list *link;`

			`if (!n->mode)`
			`return entry;`
			`if (entry)`
			`path = entry->path;`
			`else`
			`path = strdup(mkpath("%s%s", base, n->path));`
			`link = create_entry(stage, n->mode, n->sha1, path);`
			`link->link = entry;`
			`return link;`
			`}`

Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`static void unresolved(const char *base, struct name_entry n[3])`
			`{`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00			`struct merge_list *entry = NULL;`

Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`if (unresolved_directory(base, n))`
			`return;`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00
			`/*`
			`* Do them in reverse order so that the resulting link`
			`* list has the stages in order - link_entry adds new`
			`* links at the front.`
			`*/`
			`entry = link_entry(3, base, n + 2, entry);`
			`entry = link_entry(2, base, n + 1, entry);`
			`entry = link_entry(1, base, n + 0, entry);`

			`add_merge_entry(entry);`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`

git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00			`/*`
			`* Merge two trees together (t[1] and t[2]), using a common base (t[0])`
			`* as the origin.`
			`*`
			`* This walks the (sorted) trees in lock-step, checking every possible`
			`* name. Note that directories automatically sort differently from other`
			`* files (see "base_name_compare"), so you'll never see file/directory`
			`* conflicts, because they won't ever compare the same.`
			`*`
			`* IOW, if a directory changes to a filename, it will automatically be`
			`* seen as the directory going away, and the filename being created.`
			`*`
			`* Think of this as a three-way diff.`
			`*`
			`* The output will be either:`
			`* - successful merge`
			`* "0 mode sha1 filename"`
			`* NOTE NOTE NOTE! FIXME! We really really need to walk the index`
			`* in parallel with this too!`
			`*`
			`* - conflict:`
			`* "1 mode sha1 filename"`
			`* "2 mode sha1 filename"`
			`* "3 mode sha1 filename"`
			`* where not all of the 1/2/3 lines may exist, of course.`
			`*`
			`* The successful merge rules are the same as for the three-way merge`
			`* in git-read-tree.`
			`*/`
			`static void threeway_callback(int n, unsigned long mask, struct name_entry entry, const char base)`
			`{`
			`/* Same in both? */`
			`if (same_entry(entry+1, entry+2)) {`
			`if (entry[0].sha1) {`
			`resolve(base, NULL, entry+1);`
			`return;`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`
git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00			`}`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00
git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00			`if (same_entry(entry+0, entry+1)) {`
			`if (entry[2].sha1 && !S_ISDIR(entry[2].mode)) {`
			`resolve(base, entry+1, entry+2);`
			`return;`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`
git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00			`}`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00
git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00			`if (same_entry(entry+0, entry+2)) {`
			`if (entry[1].sha1 && !S_ISDIR(entry[1].mode)) {`
			`resolve(base, NULL, entry+1);`
			`return;`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`
			`}`
git-merge-tree: generalize the "traverse <n> trees in sync" functionality It's actually very useful for other things too. Notably, we could do the combined diff a lot more efficiently with this. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-16 04:25:32 +01:00
			`unresolved(base, entry);`
			`}`

			`static void merge_trees(struct tree_desc t[3], const char *base)`
			`{`
			`traverse_trees(3, t, base, threeway_callback);`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`}`

			`static void get_tree_descriptor(struct tree_desc desc, const char *rev)`
			`{`
			`unsigned char sha1[20];`
			`void *buf;`

Separate object name errors from usage errors Separate object name errors from usage errors. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-05-08 23:43:38 +02:00			`if (get_sha1(rev, sha1))`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`die("unknown rev %s", rev);`
			`buf = fill_tree_descriptor(desc, sha1);`
			`if (!buf)`
			`die("%s is not a tree", rev);`
			`return buf;`
			`}`

			`int main(int argc, char **argv)`
			`{`
			`struct tree_desc t[3];`
			`void buf1, buf2, *buf3;`

			`if (argc < 4)`
			`usage(merge_tree_usage);`

			`buf1 = get_tree_descriptor(t+0, argv[1]);`
			`buf2 = get_tree_descriptor(t+1, argv[2]);`
			`buf3 = get_tree_descriptor(t+2, argv[3]);`
			`merge_trees(t, "");`
			`free(buf1);`
			`free(buf2);`
			`free(buf3);`
Prepare "git-merge-tree" for future work This changes how "git-merge-tree" works in two ways: - instead of printing things out as we walk the trees, we save the results in memory. - when we've walked the tree fully, we print out the results in a more explicit way, describing the data. This is basically preparatory work for extending the git-merge-tree functionality in interesting directions. In particular, git-merge-tree is also how you would create a diff between two trees _without_ necessarily creating the merge commit itself. In other words, if you were to just wonder what another branch adds, you should be able to (eventually) just do git merge-tree -p $base HEAD $otherbranch to generate a diff of what the merge would look like. The current merge tree already basically has all the smarts for this, and the explanation of the results just means that hopefully somebody else than me could do the boring work. (You'd basically be able to do the above diff by just changing the printout format for the explanation, and making the "changed in both" first do a three-way merge before it diffs the result). The other thing that the in-memory format allows is rename detection (which the current code does not do). That's the basic reason why we don't want to just explain the differences as we go along - because we want to be able to look at the _other_ differences to see whether the reason an entry got deleted in either branch was perhaps because it got added in another place.. Rename detection should be a fairly trivial pass in between the tree diffing and the explanation. In the meantime, this doesn't actually do anything new, it just outputs the information in a more verbose manner. For an example merge, commit 5ab2c0a47574c92f92ea3709b23ca35d96319edd in the git tree works well and shows renames, along with true removals and additions and files that got changed in both branches. To see that as a tree merge, do: git-merge-tree 64e86c57 c5c23745 928e47e3 where the two last ones are the tips that got merged, and the first one is the merge base. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-06-28 20:18:27 +02:00
			`show_result();`
Handling large files with GIT On Tue, 14 Feb 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > If somebody is interested in making the "lots of filename changes" case go > > fast, I'd be more than happy to walk them through what they'd need to > > change. I'm just not horribly motivated to do it myself. Hint, hint. > > In case anybody is wondering, I share the same feeling. I > cannot say I'd be "more than happy to" clean up potential > breakages during the development of such changes, but if the > change eventually would help certain use cases, I can be > persuaded to help debugging such a mess ;-). Actually, I got interested in seeing how hard this is, and wrote a simple first cut at doing a tree-optimized merger. Let me shout a bit first: THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION RATHER THAN THE FINAL PRODUCT! With that out of the way, let me descibe what this does (and then describe the missing parts). This is basically a three-way merge that works entirely on the "tree" level, rather than on the index. A lot of the _concepts_ are the same, though, and if you're familiar with the results of an index merge, some of the output will make more sense. You give it three trees: the base tree (tree 0), and the two branches to be merged (tree 1 and tree 2 respectively). It will then walk these three trees, and resolve them as it goes along. The interesting part is: - it can resolve whole sub-directories in one go, without actually even looking recursively at them. A whole subdirectory will resolve the same way as any individual files will (although that may need some modification, see later). - if it has a "content conflict", for subdirectories that means "try to do a recursive tree merge", while for non-subdirectories it's just a content conflict and we'll output the stage 1/2/3 information. - a successful merge will output a single stage 0 ("merged") entry, potentially for a whole subdirectory. - it outputs all the resolve information on stdout, so something like the recursive resolver can pretty easily parse it all. Now, the caveats: - we probably need to be more careful about subdirectory resolves. The trivial case (both branches have the exact same subdirectory) is a trivial resolve, but the other cases ("branch1 matches base, branch2 is different" probably can't be silently just resolved to the "branch2" subdirectory state, since it might involve renames into - or out of - that subdirectory) - we do not track the current index file at all, so this does not do the "check that index matches branch1" logic that the three-way merge in git-read-tree does. The theory is that we'd do a full three-way merge (ignoring the index and working directory), and then to update the working tree, we'd do a two-way "git-read-tree branch1->result" - I didn't actually make it do all the trivial resolve cases that git-read-tree does. It's a technology demonstration. Finally (a more serious caveat): - doing things through stdout may end up being so expensive that we'd need to do something else. In particular, it's likely that I should not actually output the "merge results", but instead output a "merge results as they _differ_ from branch1" However, I think this patch is already interesting enough that people who are interested in merging trees might want to look at it. Please keep in mind that tech _demo_ part, and in particular, keep in mind the final "serious caveat" part. In many ways, the really _interesting_ part of a merge is not the result, but how it _changes_ the branch we're merging into. That's particularly important as it should hopefully also mean that the output size for any reasonable case is minimal (and tracks what we actually need to do to the current state to create the final result). The code very much is organized so that doing the result as a "diff against branch1" should be quite easy/possible. I was actually going to do it, but I decided that it probably makes the output harder to read. I dunno. Anyway, let's think about this kind of approach.. Note how the code itself is actually quite small and short, although it's prbably pretty "dense". As an interesting test-case, I'd suggest this merge in the kernel: git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc which resolves beautifully (there are no actual file-level conflicts), and you can look at the output of that command to start thinking about what it does. The interesting part (perhaps) is that timing that command for me shows that it takes all of 0.004 seconds.. (the git-merge-base thing takes considerably more ;) The point is, we _can_ do the actual merge part really really quickly. Linus PS. Final note: when I say that it is "WORKING CODE", that is obviously by my standards. IOW, I tested it once and it gave reasonable results - so it must be perfect. Whether it works for anybody else, or indeed for any other test-case, is not my problem ;) Signed-off-by: Junio C Hamano <junkio@cox.net> 2006-02-15 03:05:30 +01:00			`return 0;`
			`}`