mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 14:57:52 +01:00

Author	SHA1	Message	Date
Shawn O. Pearce	1fdb649c6a	Make trailing LF optional for all fast-import commands For the same reasons as the prior change we want to allow frontends to omit the trailing LF that usually delimits commands. In some cases these just make the input stream more verbose looking than it needs to be, and its just simpler for the frontend developer to get started if our parser is slightly more lenient about where an LF is required and where it isn't. To make this optional LF feature work we now have to buffer up to one line of input in command_buf. This buffering can happen if we look at the current input command but don't recognize it at this point in the code. In such a case we need to "unget" the entire line, but we cannot depend upon the stdio library to let us do ungetc() for that many characters at once. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:35 -04:00
Shawn O. Pearce	2c570cde98	Make trailing LF following fast-import `data` commands optional A few fast-import frontend developers have found it odd that we require the LF following a `data` command, especially in the exact byte count format. Technically we don't need this LF to parse the stream properly, but having it here does make the stream more readable to humans. We can easily make the LF optional by peeking at the next byte available from the stream and pushing it back into the buffer if its not LF. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:35 -04:00
Shawn O. Pearce	401d53fa35	Teach fast-import to ignore lines starting with '#' Several frontend developers have asked that some form of stream comments be permitted within a fast-import data stream. This way they can include information from their own frontend program about where specific data was taken from in the source system, or about a decision that their frontend may have made while creating the fast-import data stream. This change introduces comments in the Bourne-shell/Tcl/Perl style. Lines starting with '#' are ignored, up to and including the LF. Unlike the above mentioned three languages however we do not look for and ignore leading whitespace. This just simplifies the definition of the comment format and the code that parses them. To make comments work we had to stop using read_next_command() within cmd_data() and directly invoke read_line() during the inline variant of the function. This is necessary to retain any lines of the input data that might otherwise look like a comment to fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:35 -04:00
Shawn O. Pearce	3149007475	Use handy ALLOC_GROW macro in fast-import when possible Instead of growing our buffer by hand during the inline variant of cmd_data() we can save a few lines of code and just use the nifty new ALLOC_GROW macro already available to us. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:34 -04:00
Shawn O. Pearce	ea08a6fd19	Actually allow TAG_FIXUP branches in fast-import Michael Haggerty <mhagger@alum.mit.edu> noticed while debugging a Git backend for cvs2svn that fast-import was barfing when he tried to use "TAG_FIXUP" as a branch name for temporary work needed to cleanup the tree prior to creating an annotated tag object. The reason we were rejecting the branch name was check_ref_format() returns -2 when there are less than 2 '/' characters in the input name. TAG_FIXUP has 0 '/' characters, but is technically just as valid of a ref as HEAD and MERGE_HEAD, so we really should permit it (and any other similar looking name) during import. New test cases have been added to make sure we still detect very wrong branch names (e.g. containing [ or starting with .) and yet still permit reasonable names (e.g. TAG_FIXUP). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:34 -04:00
Alex Riesen	c905e09006	Fix whitespace in "Format of STDIN stream" of fast-import Something probably assumed that HT indentation is 4 characters. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-08-19 03:38:34 -04:00
Luiz Fernando N. Capitulino	7647b17f1d	Use xmkstemp() instead of mkstemp() xmkstemp() performs error checking and prints a standard error message when an error occur. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-08-14 22:20:26 -07:00
Shawn O. Pearce	b6f3481bb4	Teach fast-import to recursively copy files/directories Some source material (e.g. Subversion dump files) perform directory renames by telling us the directory was copied, then deleted in the same revision. This makes it difficult for a frontend to convert such data formats to a fast-import stream, as all the frontend has on hand is "Copy a/ to b/; Delete a/" with no details about what files are in a/, unless the frontend also kept track of all files. The new 'C' subcommand within a commit allows the frontend to make a recursive copy of one path to another path within the branch, without needing to keep track of the individual file paths. The metadata copy is performed in memory efficiently, but is implemented as a copy-immediately operation, rather than copy-on-write. With this new 'C' subcommand frontends could obviously implement an 'R' (rename) on their own as a combination of 'C' and 'D' (delete), but since we have already offered up 'R' in the past and it is a trivial thing to keep implemented I'm not going to deprecate it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-07-15 01:41:23 -04:00
Shawn O. Pearce	f39a946a1f	Support wholesale directory renames in fast-import Some source material (e.g. Subversion dump files) perform directory renames without telling us exactly which files in that subdirectory were moved. This makes it hard for a frontend to convert such data formats to a fast-import stream, as all the frontend has on hand is "Rename a/ to b/" with no details about what files are in a/, unless the frontend also kept track of all files. The new 'R' subcommand within a commit allows the frontend to rename either a file or an entire subdirectory, without needing to know the object's SHA-1 or the specific files contained within it. The rename is performed as efficiently as possible internally, making it cheaper than a 'D'/'M' pair for a file rename. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-07-09 23:06:16 -04:00
Junio C Hamano	98ee8187e4	Merge branch 'maint' * maint: Fix possible coredump with fast-import --import-marks Refactor fast-import branch creation from existing commit fast-import: Fix crash when referencing already existing objects fast-import: Fix uninitialized variable Documentation: fix git-config.xml generation	2007-05-23 22:37:23 -07:00
Shawn O. Pearce	aac65ed1bc	Fix possible coredump with fast-import --import-marks When `e8438420bb` allowed us to reload the marks table on subsequent runs of fast-import we really broke things, as we set pack_id to MAX_PACK_ID for any objects we imported into the marks table. Creating a branch from that mark should fail as we attempt to read the object through a non-existant packed_git pointer. Instead we have to use the normal Git object system to locate the older commit, as we ourselves do not have a reference to the packed_git it resides in. This bug only occurred because t9300 was not complete enough. When we added the --import-marks feature we didn't actually test its implementation enough to verify the function worked as intended. I have corrected that, and included the changes as part of this fix. Prior versions of fast-import fail the new test(s); this commit allows them to pass. Credit for this bug find goes to Simon Hausmann <simon@lst.de> as he recently identified a similiar bug in the tree lazy-loading path. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-24 00:50:19 -04:00
Shawn O. Pearce	654aaa37ab	Refactor fast-import branch creation from existing commit To resolve a corner case uncovered by Simon Hausmann I need to reuse the logic for the SHA-1 expression version of the 'from ' command within the mark version of the 'from ' command. This change doesn't alter any functionality, but is merely breaking the common code out to a function that I can reuse. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-24 00:11:48 -04:00
Simon Hausmann	20f546a86c	fast-import: Fix crash when referencing already existing objects Commit `a5c1780a03` sets the pack_id of existing objects to MAX_PACK_ID. When the same object is referenced later again it is found in the local object hash. With such a pack_id fast-import should not try to locate that object in the newly created pack(s). Signed-off-by: Simon Hausmann <simon@lst.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-23 23:36:47 -04:00
Simon Hausmann	b259157f3c	fast-import: Fix uninitialized variable Fix uninitialized last_object->no_free variable that is accessed in store_object. Signed-off-by: Simon Hausmann <simon@lst.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-23 23:36:47 -04:00
Sven Verdoolaege	68db31cc28	git-update-ref: add --no-deref option for overwriting/detaching ref git-checkout is also adapted to make use of this new option instead of the handcrafted command sequence. Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-10 15:24:44 -07:00
Dana L. How	8b0eca7c7b	Create pack-write.c for common pack writing code Include a generalized fixup_pack_header_footer() in this new file. Needed by git-repack --max-pack-size feature in a later patchset. [sp: Moved close(pack_fd) to callers, to support index-pack, and changed name to better indicate it is for packfiles.] Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-02 13:24:18 -04:00
Junio C Hamano	39231b1c32	Merge branch 'maint' * maint: http.c: Fix problem with repeated calls of http_init Add missing reference to GIT_COMMITTER_DATE in git-commit-tree documentation Fix import-tars fix. Update .mailmap with "Michael" Do not barf on too long action description Catch empty pathnames in trees during fsck Don't allow empty pathnames in fast-import import-tars: be nice to wrong directory modes git-svn: Added 'find-rev' command git shortlog documentation: add long options and fix a typo	2007-04-29 01:52:43 -07:00
Shawn O. Pearce	475d1b333a	Don't allow empty pathnames in fast-import riddochc on #git noticed corruption caused by import-tars. This was fixed in the prior commit by Dscho, but fast-import was wrong to have allowed a tree to be created with an empty string as the filename. No operating system allows this, and Git itself doesn't accept this into the index. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-04-28 20:03:25 -04:00
Sami Farin	00be8dcc1a	fast-import: size_t vs ssize_t size_t is unsigned, so (n < 0) is never true. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-04-24 16:14:48 -04:00
Shawn O. Pearce	a5c1780a03	Don't repack existing objects in fast-import Some users of fast-import have been trying to use it to rewrite commits and trees, an activity where the all of the relevant blobs are already available from the existing packfiles. In such a case we don't want to repack a blob, even if the frontend application has supplied us the raw data rather than a mark or a SHA-1 name. I'm intentionally only checking the packfiles that existed when fast-import started and am always ignoring all loose object files. We ignore loose objects because fast-import tends to operate on a very large number of objects in a very short timespan, and it is usually creating new objects, not reusing existing ones. In such a situtation the majority of the objects will not be found in the existing packfiles, nor will they be loose object files. If the frontend application really wants us to look at loose object files, then they can just repack the repository before running fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-04-20 11:23:45 -04:00
Theodore Ts'o	46efd2d93c	Rename warn() to warning() to fix symbol conflicts on BSD and Mac OS This fixes a problem reported by Randal Schwartz: >I finally tracked down all the (albeit inconsequential) errors I was getting >on both OpenBSD and OSX. It's the warn() function in usage.c. There's >warn(3) in BSD-style distros. It'd take a "great rename" to change it, but if >someone with better C skills than I have could do that, my linker and I would >appreciate it. It was annoying to me, too, when I was doing some mergetool testing on Mac OS X, so here's a fix. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Randal L. Schwartz" <merlyn@stonehenge.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-31 01:11:11 -07:00
Nicolas Pitre	0e55181f29	make it more obvious that temporary files are temporary files When some operations are interrupted (or "die()'d" or crashed) then the partial object/pack/index file may remain around. Make it more obvious in their name that those files are temporary stuff and can be cleaned up if no operation is in progress. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-24 22:32:39 -07:00
Shawn O. Pearce	061e35c581	Remove unnecessary casts from fast-import Jeff King pointed out that these casts are quite unnecessary, as the compiler should be doing them anyway, and may cause problems in the future if the size of the argument for to_atom were to ever be increased. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-12 15:48:37 -04:00
Shawn O. Pearce	7f09ac4714	Merge branch 'maint' * maint: fast-import: grow tree storage more aggressively	2007-03-12 15:04:46 -04:00
Jeff King	f022f85f6d	fast-import: grow tree storage more aggressively When building up a tree for a commit, fast-import dynamically allocates memory for the tree entries. When more space is needed, the allocated memory is increased by a constant amount. For very large trees, this means re-allocating and memcpy()ing the memory O(n) times. To compound this problem, releasing the previous tree resource does not free the memory; it is kept in a pool for future trees. This means that each of the O(n) allocations will consume increasing amounts of memory, giving O(n^2) memory consumption. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-12 15:01:44 -04:00
Junio C Hamano	f45fa2a073	Merge branch 'master' of git://repo.or.cz/git/fastimport * 'master' of git://repo.or.cz/git/fastimport: Allow fast-import frontends to reload the marks table Use atomic updates to the fast-import mark file Preallocate memory earlier in fast-import	2007-03-07 23:10:05 -08:00
Shawn O. Pearce	e8438420bb	Allow fast-import frontends to reload the marks table I'm giving fast-import a lesson on how to reload the marks table using the same format it outputs with --export-marks. This way a frontend can reload the marks table from a prior import, making incremental imports less painful. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-07 18:07:26 -05:00
Shawn O. Pearce	60b9004cdb	Use atomic updates to the fast-import mark file When we allow fast-import frontends to reload a mark file from a prior session we want to let them use the same file as they exported the marks to. This makes it very simple for the frontend to save state across incremental imports. But we don't want to lose the old marks table if anything goes wrong while writing our current marks table. So instead of truncating and overwriting the path specified to --export-marks we use the standard lockfile code to write the current marks out to a temporary file, then rename it over the old marks table. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-07 18:05:38 -05:00
Shawn O. Pearce	93e72d8d8f	Preallocate memory earlier in fast-import I'm about to teach fast-import how to reload the marks file created by a prior session. The general approach that I want to use is to immediately parse the marks file when the specific argument is found in argv, thereby allowing the caller to supply multiple marks files, as the mark space can be sparsely populated. To make that work out we need to allocate our object tables before we parse the command line options. Since none of these tables depend on the command line options, we can easily relocate them. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-07 17:11:02 -05:00
Shawn O. Pearce	6777a59fcd	Use off_t in pack-objects/fast-import when we mean an offset Always use an off_t value in pack-objects anytime we are dealing with an offset to some data within a packfile. Also fixed a minor uintmax_t that was incorrectly defined before. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-07 11:06:33 -08:00
Shawn O. Pearce	c4001d92be	Use off_t when we really mean a file offset. Not all platforms have declared 'unsigned long' to be a 64 bit value, but we want to support a 64 bit packfile (or close enough anyway) in the near future as some projects are getting large enough that their packed size exceeds 4 GiB. By using off_t, the POSIX type that is declared to mean an offset within a file, we support whatever maximum file size the underlying operating system will handle. For most modern systems this is up around 2^60 or higher. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-07 11:06:25 -08:00
Shawn O. Pearce	3a55602eec	General const correctness fixes We shouldn't attempt to assign constant strings into char*, as the string is not writable at runtime. Likewise we should always be treating unsigned values as unsigned values, not as signed values. Most of these are very straightforward. The only exception is the (unnecessary) xstrdup/free in builtin-branch.c for the detached head case. Since this is a user-level interactive type program and that particular code path is executed no more than once, I feel that the extra xstrdup call is well worth the easy elimination of this warning. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-03-07 10:47:10 -08:00
Shawn O. Pearce	6b4318e604	Merge branch 'maint' * maint: fast-import: Fail if a non-existant commit is used for merge fast-import: Avoid infinite loop after reset [sp: Minor evil merge to deal with type_names array moving to be private in 'master'.]	2007-03-05 12:50:29 -05:00
Shawn O. Pearce	2f6dc35d2a	fast-import: Fail if a non-existant commit is used for merge Johannes Sixt noticed during one of his own imports that fast-import did not fail if a non-existant commit is referenced by SHA-1 value as an argument to the 'merge' command. This allowed the user to unknowingly create commits that would fail in fsck, as the commit contents would not be completely reachable. A side effect of this bug was that a frontend process could mark any SHA-1 object (blob, tree, tag) as a parent of a merge commit. This should also fail in fsck, as the commit is not a valid commit. We now use the same rule as the 'from' command. If a commit is referenced in the 'merge' command by hex formatted SHA-1 then the SHA-1 must be a commit or a tag that can be peeled back to a commit, the commit must already exist, and must be readable by the core Git infrastructure code. This requirement means that the commit must have existed prior to fast-import starting, or the commit must have been flushed out by a prior 'checkpoint' command. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-05 12:43:14 -05:00
Shawn O. Pearce	734c91f9e2	fast-import: Avoid infinite loop after reset Johannes Sixt noticed that a 'reset' command applied to a branch that is already active in the branch LRU cache can cause fast-import to relink the same branch into the LRU cache twice. This will cause the LRU cache to contain a cycle, making unload_one_branch run in an infinite loop as it tries to select the oldest branch for eviction. I have trivially fixed the problem by adding an active bit to each branch object; this bit indicates if the branch is already in the LRU and allows us to avoid trying to add it a second time. Converting the pack_id field into a bitfield makes this change take up no additional memory. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-03-05 12:31:09 -05:00
Nicolas Pitre	21666f1aae	convert object type handling from a string to a number We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-27 01:34:21 -08:00
Nicolas Pitre	df8436622f	formalize typename(), and add its reverse type_from_string() Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-27 01:34:21 -08:00
Junio C Hamano	599065a3bb	prefixcmp(): fix-up mechanical conversion. Previous step converted use of strncmp() with literal string mechanically even when the result is only used as a boolean: if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo"))) This step manually cleans them up to read: if (!prefixcmp(arg, "foo")) Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-20 22:03:15 -08:00
Junio C Hamano	cc44c7655f	Mechanical conversion to use prefixcmp() This mechanically converts strncmp() to use prefixcmp(), but only when the parameters match specific patterns, so that they can be verified easily. Leftover from this will be fixed in a separate step, including idiotic conversions like if (!strncmp("foo", arg, 3)) => if (!(-prefixcmp(arg, "foo"))) This was done by using this script in px.perl #!/usr/bin/perl -i.bak -p if (/strncmp\(([^,]+), "([^\\"])", (\d+)\)/ && (length($2) == $3)) { s\|strncmp\(([^,]+), "([^\\"])", (\d+)\)\|prefixcmp($1, "$2")\|; } if (/strncmp\("([^\\"])", ([^,]+), (\d+)\)/ && (length($1) == $3)) { s\|strncmp\("([^\\"])", ([^,]+), (\d+)\)\|(-prefixcmp($2, "$1"))\|; } and running: $ git grep -l strncmp -- '*.c' \| xargs perl px.perl Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-20 22:03:15 -08:00
Junio C Hamano	9360f27ff7	Merge branch 'maint' * maint: Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c.	2007-02-20 22:02:15 -08:00
Jason Riedy	3efb1f343a	Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c. Thanks to Simon 'corecode' Schubert <corecode@fs.ei.tum.de> for the clean-up. Defining the C99 standard PRIuMAX when necessary replaces UM_FMT and the awkward UM10_FMT. There are no direct C99 translations for other uses of NO_C99_FORMAT in git, alas. Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-20 19:10:57 -08:00
Junio C Hamano	2b2b892e36	Merge branch 'maint' * maint: Obey NO_C99_FORMAT in fast-import.c. Add a compat/strtoumax.c for Solaris 8. git-clone: Sync documentation to usage note.	2007-02-19 18:29:41 -08:00
Jason Riedy	e326bce65c	Obey NO_C99_FORMAT in fast-import.c. Define UM_FMT and UM10_FMT and use in place of %ju and %10ju, respectively. Both format as unsigned long long, so this assumes the compiler supports long long. Signed-off-by: Jason Riedy <jason@acm.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-19 18:20:49 -08:00
Junio C Hamano	4a164d48df	Merge branch 'jc/merge-base' (early part) This contains an evil merge to fast-import, in order to resolve in_merge_bases() update.	2007-02-13 16:54:35 -08:00
Shawn O. Pearce	ea5e370aa9	fast-import: Support reusing 'from' and brown paper bag fix reset. It was suggested on the mailing list that being able to use `from` in any commit to reset the current branch is useful in some types of importers, such as a darcs importer. We originally did not permit resetting an existing branch with a new `from` command during a `commit` command, but this restriction was only to help debug the hacked up cvs2svn that Jon Smirl was developing in parallel with git-fast-import. It is probably more of a problem to disallow it than to allow it. So now we permit a `from` during any `commit`. While making the changes required to permit multiple `from` commands on the same branch, I discovered we no longer needed the last_commit field to be set to 0 during a reset, so that was removed. (Reset was originally setting the field to 0 to signal cmd_from() that it was OK to execute on the branch.) While poking around in this section of fast-import I also realized the `reset` command was not working as intended if the corresponding `from` command was omitted (as allowed by the BNF grammar and the code). If `from` was omitted we cleared out the tree but we left the tree SHA-1 and parent commit SHA-1 intact. This is not what the user intended in this case. Instead they would be trying to reset the branch to have no parent and to have no tree, making the branch look new-born during the next commit. We now clear these SHA-1 values during `reset`, ensuring the branch looks new-born if `from` does not get supplied. New test cases for these were also added. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-12 12:17:31 -05:00
Shawn O. Pearce	bdf1c06dc1	fast-import: Hide the pack boundary commits by default. Most users don't need the pack boundary information that fast-import was printing to standard output, especially if they were calling it with --quiet. Those users who do want this information probably want it captured so they can go back and use it to repack the imported repository. So dumping the boundary commits to a log file makes more sense then printing them to standard output. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-11 19:45:56 -05:00
Johannes Schindelin	40db58b8dc	fast-import: Fix compile warnings Not on all platforms are size_t and unsigned long equivalent. Since I do not know how portable %z is, I play safe, and just cast the respective variables to unsigned long. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-07 09:28:23 -08:00
Shawn O. Pearce	22c9f7e4c5	Don't crash fast-import if the marks cannot be exported. Apparently fast-import used to die a horrible death if we were unable to open the marks file for output. This is slightly less than ideal, especially now that we dump the marks as part of the `checkpoint` command. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-07 02:46:35 -05:00
Shawn O. Pearce	820b931012	Dump all refs and marks during a checkpoint in fast-import. If the frontend asks us to checkpoint (via the explicit checkpoint command) its probably because they are afraid the current import will crash/fail/whatever and want to make sure they can pickup from the last checkpoint. To do that sort of recovery, we will need the current tip of every branch and tag available at the next startup. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-07 02:42:44 -05:00
Shawn O. Pearce	c499d76849	Teach fast-import how to sit quietly in the corner. Often users will be running fast-import from within a larger frontend process, and this may be a frequent periodic tool such as a future edition of `git-svn fetch`. We don't want to bombard users with our large stats output if they won't be interested in it, so `--quiet` is now an option to make gfi be more silent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-07 02:19:31 -05:00
Shawn O. Pearce	825769a8fe	Teach fast-import how to clear the internal branch content. Some frontends may not be able to (easily) keep track of which files are included in the branch, and which aren't. Performing this tracking can be tedious and error prone for the frontend to do, especially if its foreign data source cannot supply the changed path list on a per-commit basis. fast-import now allows a frontend to request that a branch's tree be wiped clean (reset to the empty tree) at the start of a commit, allowing the frontend to feed in all paths which belong on the branch. This is ideal for a tar-file importer frontend, for example, as the frontend just needs to reformat the tar data stream into a gfi data stream, which may be something a few Perl regexps can take care of. :) Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-07 02:03:03 -05:00
Junio C Hamano	9981b6d915	S_IFLNK != 0140000 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 16:08:30 -05:00
Shawn O. Pearce	7073e69e38	Don't do non-fastforward updates in fast-import. If fast-import is being used to update an existing branch of a repository, the user may not want to lose commits if another process updates the same ref at the same time. For example, the user might be using fast-import to make just one or two commits against a live branch. We now perform a fast-forward check during the ref updating process. If updating a branch would cause commits in that branch to be lost, we skip over it and display the new SHA1 to standard error. This new default behavior can be overridden with `--force`, like git-push and git-fetch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 16:08:06 -05:00
Shawn O. Pearce	63e0c8b364	Support RFC 2822 date parsing in fast-import. Since some frontends may be working with source material where the dates are only readily available as RFC 2822 strings, it is more friendly if fast-import exposes Git's parse_date() function to handle the conversion. This way the frontend doesn't need to perform the parsing itself. The new --date-format option to fast-import can be used by a frontend to select which format it will supply date strings in. The default is the standard `raw` Git format, which fast-import has always supported. Format rfc2822 can be used to activate the parse_date() function instead. Because fast-import could also be useful for creating new, current commits, the format `now` is also supported to generate the current system timestamp. The implementation of `now` is a trivial call to datestamp(), but is actually a whole whopping 3 lines so that fast-import can verify the frontend really meant `now`. As part of this change I have added validation of the `raw` date format. Prior to this change fast-import would accept anything in a `committer` command, even if it was seriously malformed. Now fast-import requires the '> ' near the end of the string and verifies the timestamp is formatted properly. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 14:58:30 -05:00
Shawn O. Pearce	e7d06a4b70	Remove unnecessary null pointer checks in fast-import. There is no need to check for a NULL pointer before invoking free(), the runtime library automatically performs this check anyway and does nothing if a NULL pointer is supplied. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 12:05:51 -05:00
Shawn O. Pearce	e5b1444b96	Correct minor style issue in fast-import. Junio noticed that I was using a different style in fast-import for returned pointers than the rest of Git. Before merging this code into the main git.git tree I'd like to make it consistent, as this style variation was not intentional. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:43:59 -05:00
Shawn O. Pearce	10e8d68820	Correct compiler warnings in fast-import. Junio noticed these warnings/errors in fast-import when compiling with `-Werror -ansi -pedantic`. A few changes are to reduce compiler warnings, while one (in cmd_merge) is a bug fix. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:26:49 -05:00
Shawn O. Pearce	0b868e0240	Remove --branch-log from fast-import. The --branch-log option and its associated code hasn't been used in several months, as its not really very useful for debugging fast-import or a frontend. I don't plan on supporting it in this state long-term, so I'm killing it now before it gets distributed to a wider audience. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:15:37 -05:00
Shawn O. Pearce	6c3aac1c69	Don't support shell-quoted refnames in fast-import. The current implementation of shell-style quoted refnames and SHA-1 expressions within fast-import contains a bad memory leak. We leak the unquoted strings used by the `from` and `merge` commands, maybe others. Its also just muddling up the docs. Since Git refnames cannot contain LF, and that is our delimiter for the end of the refname, and we accept any other character as-is, there is no reason for these strings to support quoting, except to be nice to frontends. But frontends shouldn't be expecting to use funny refs in Git, and its just as simple to never quote them as it is to always pass them through the same quoting filter as pathnames. So frontends should never quote refs, or ref expressions. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 20:30:37 -05:00
Shawn O. Pearce	10831c5513	Reduce memory usage of fast-import. Some structs are allocated rather frequently, but were using integer types which were far larger than required to actually store their full value range. As packfiles are limited to 4 GiB we don't need more than 32 bits to store the offset of an object within that packfile, an `unsigned long` on a 64 bit system is likely a 64 bit unsigned value. Saving 4 bytes per object on a 64 bit system can add up fast on any sizable import. As atom strings are strictly single components in a path name these are probably limited to just 255 bytes by the underlying OS. Going to that short of a string is probably too restrictive, but certainly `unsigned int` is far too large for their lengths. `unsigned short` is a reasonable limit. Modes within a tree really only need two bytes to store their whole value; using `unsigned int` here is vast overkill. Saving 4 bytes per file entry in an active branch can add up quickly on a project with a large number of files. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 16:34:56 -05:00
Shawn O. Pearce	8c1f22da9f	Include checkpoint command in the BNF. This command isn't encouraged (as its slow) but it does exist and is accepted, so it still should be covered in the BNF. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 16:05:11 -05:00
Shawn O. Pearce	76db9dec81	Merge branch 'master' into sp/gfi git-fast-import requires use of inttypes.h, but the master branch has added it to git-compat-util differently than git-fast-import originally had used it. This merge back of master to the fast-import topic is to get (and use) inttypes.h the way master is using it. This is a partially evil merge to remove the call to setup_ident(), as the master branch now contains a change which makes this implicit and therefore removed the function declaration. (commit `01754769`). Conflicts: git-compat-util.h	2007-01-30 11:07:24 -05:00
Shawn O. Pearce	b715cfbba4	Accept 'inline' file data in fast-import commit structure. Its very annoying to need to specify the file content ahead of a commit and use marks to connect the individual blobs to the commit's file modification entry, especially if the frontend can't/won't generate the blob SHA1s itself. Instead it would much easier to use if we can accept the blob data at the same time as we receive each file_change line. Now fast-import accepts 'inline' instead of a mark idnum or blob SHA1 within the 'M' type file_change command. If an inline is detected the very next line must be a 'data n' command, supplying the file data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 15:17:58 -05:00
Shawn O. Pearce	3b4dce0275	Support delimited data regions in fast-import. During testing its nice to not have to feed the length of a data chunk to the 'data' command of fast-import. Instead we would prefer to be able to establish a data chunk much like shell's << operator and use a line delimiter to denote the end of the input. So now if a data command is started as 'data <<EOF' we will look for a terminator line containing only the string EOF on that line. Once found, we stop the data command. Everything between the two lines is used as the data value. The 'data <<' syntax is slower than 'data n', as we don't know how many bytes to expect and instead must grow our buffer on the fly. It also has the problem that the frontend must use a string which will not appear on a line by itself in the input, and the data region will always end in an LF. For these reasons real import frontends are encouraged to continue to use _only_ 'data n'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 13:25:37 -05:00
Shawn O. Pearce	e5808826c4	Remove unnecessary options from fast-import. The --objects command line option is rather unnecessary. Internally we allocate objects in 5000 unit blocks, ensuring that any sort of malloc overhead is ammortized over the individual objects to almost nothing. Since most frontends don't know how many objects they will need for a given import run (and its hard for them to predict without just doing the run) we probably won't see anyone using --objects. Further since there's really no major benefit to using the option, most frontends won't even bother supplying it even if they could estimate the number of objects. So I'm removing it. The --max-objects-per-pack option was probably a mistake to even have added in the first place. The packfile format is limited to 4 GiB today; given that objects need at least 3 bytes of data (and probably need even more) there's no way we are going to exceed the limit of 1<<32-1 objects before we reach the file size limit. So I'm removing it (to slightly reduce the complexity of the code) before anyone gets any wise ideas and tries to use it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 12:02:37 -05:00
Shawn O. Pearce	ebea9dd4f1	Use fixed-size integers when writing out the index in fast-import. Currently the pack .idx file format uses 32-bit unsigned integers for the fan-out table and the object offsets. We had previously defined these as 'unsigned int', but not every system will define that type to be a 32 bit value. To ensure maximum portability we should always use 'uint32_t'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 11:30:17 -05:00
Shawn O. Pearce	566f44252b	Always use struct pack_header for pack header in fast-import. Previously we were using 'unsigned int' to update the hdr_entries field of the pack header after the file had been completed and was being hashed. This may not be 32 bits on all platforms. Instead we want to always uint32_t. I'm actually cheating here by just using the pack_header like the rest of Git and letting the struct definition declare the correct type. Right now that field is still 'unsigned int' (wrong) but a pending change submitted by Simon 'corecode' Schubert changes it to uint32_t. After that change is merged in fast-import will do the right thing all of the time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 11:26:06 -05:00
Shawn O. Pearce	69e74e7412	Correct packfile edge output in fast-import. Branches are only contained by a packfile if the branch actually had its most recent commit in that packfile. So new branches are set to MAX_PACK_ID to ensure they don't cause their commit to list as part of the first packfile when it closes out if the commit was actually in existance before fast-import started. Also corrected the type of last_commit to be umaxint_t to prevent overflow and wraparound on very large imports. Though that is highly unlikely to occur as we're talking 4 billion commits, which no real project has right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 02:42:43 -05:00
Shawn O. Pearce	fd99224eec	Declare no-arg functions as (void) in fast-import. Apparently the git convention is to declare any function which takes no arguments as taking void. I did not do this during the early fast-import development, but should have. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 01:47:25 -05:00
Shawn O. Pearce	6f64f6d9d2	Correct a few types to be unsigned in fast-import. The length of an atom string cannot be negative. So make it explicit and declare it as an unsigned value. The shift width in a mark table node also cannot be negative. I'm also moving it to after the pointer arrays to prevent any possible alignment problems on a 64 bit system. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 01:13:22 -05:00
Shawn O. Pearce	2104838bf9	Corrected BNF input documentation for fast-import. Now that fast-import uses uintmax_t (the largest available unsigned integer type) for marks we don't want to say its an unsigned 32 bit integer in ASCII base 10 notation. It could be much larger, especially on 64 bit systems, and especially if a frontend uses a very large number of marks (1 per file revision on a very, very large import). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 00:33:18 -05:00
Shawn O. Pearce	2369ed7907	Print out the edge commits for each packfile in fast-import. To help callers repack very large repositories into a series of packfiles fast-import now outputs the last commits/tags it wrote to a packfile when it prints out the packfile name. This information can be feed to pack-objects --revs to repack. For the first pack of an initial import this is pretty easy (just feed those SHA1s on stdin) but for subsequent packs you want to feed the subsequent pack's final SHA1s but also all prior pack's SHA1s prefixed with the negation operator. This way the prior pack's data does not get included into the subsequent pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 16:18:44 -05:00
Shawn O. Pearce	a7ddc48765	Correct object_count type and stat output in fast-import. Since object_count is limited to 'unsigned long' (really an unsigned 32 bit integer value) by the pack file format we may as well use exactly that type here in fast-import for that counter. An earlier change by me incorrectly made it uintmax_t. But since object_count is a counter for the current packfile only, we don't want to output its value at the end. Instead we should sum up the individual type counters and report that total, as that will cover all of the packfiles. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 04:55:41 -05:00
Shawn O. Pearce	eec11c2484	Correct max_packsize default in fast-import. Apparently amd64 has defined 'unsigned long' to be a 64 bit value, which means -1 was way over the 4 GiB packfile limit. Whoops. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 04:25:12 -05:00
Shawn O. Pearce	0fcbcae753	Remove unnecessary pack_fd global in fast-import. Much like the pack_sha1 the pack_fd is an unnecessary global variable, we already have the fd stored in our struct packed_git *pack_data so that the core library functions in sha1_file.c are able to lookup and decompress object data that we have previously written. Keeping an extra copy of this value in our own variable is just a hold-over from earlier versions of fast-import and is now completely unnecessary. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:20:57 -05:00
Shawn O. Pearce	1280158738	Ensure we close the packfile after creating it in fast-import. Because we are renaming the packfile into its file destination we need to be sure its not open when the rename is called, otherwise some operating systems (e.g. Windows) may prevent the rename from occurring. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:17:47 -05:00
Shawn O. Pearce	8455e48476	Use .keep files in fast-import during processing. Because fast-import automatically updates all references (heads and tags) at the end of its run the repository is corrupt unless the objects are available in the .git/objects/pack directory prior to the refs being modified. The easiest way to ensure that is true is to move the packfile and its associated index directly into the .git/objects/pack directory as soon as we have finished output to it. But the only safe way to do this is to create the a temporary .keep file for that pack, so we use the same tricks that index-pack uses when its being invoked by receive-pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:15:31 -05:00
Shawn O. Pearce	09543c96bb	Reuse sha1 in packed_git in fast-import. Rather than maintaing our own packfile level sha1 variable we can make use of the one already available in struct packed_git. Its meant for the SHA1 of the index but it can also hold the SHA1 of the packfile itself between final checksumming of the packfile and creation of the index. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:44:48 -05:00
Shawn O. Pearce	6cf0926193	Replace redundant yread() with read_in_full() in fast-import. Prior to git having read_in_full() fast-import used its own private function yread to perform the header reading task. No sense in keeping that around now that read_in_full is a public, stable function. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:35:41 -05:00
Shawn O. Pearce	0ea9f045f4	Use uintmax_t for marks in fast-import. If a frontend wants to use a mark per file revision and per commit and is doing a truly huge import (such as a 32 GiB SVN repository) we may need more than 2**32 unique mark values, especially if the frontend is unable (or unwilling) to recycle mark values. For mark idnums we should use the largest unsigned integer type available, hoping that will be at least 64 bits when we are compiled as a 64 bit executable. This way we may consume huge amounts of memory storing our mark table, but we'll at least be able to process the entire import without failing. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:33:36 -05:00
Shawn O. Pearce	5d6f3ef641	Corrected buffer overflow during automatic checkpoint in fast-import. If we previously were using a delta but we needed to checkpoint the current packfile and switch to a new packfile we need to throw away the delta and compress the raw object by itself, as delta chains cannot span non-thin packfiles. Unfortunately the output buffer in this case needs to grow, as the size of the compressed object may be quite a bit larger than the size of the compressed delta. I've also avoided recompressing the object if we are checkpointing and we didn't use a delta. In this case the output buffer is the correct size and has already been populated with the right data, we just need to close out the current packfile and open a new one. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 23:40:27 -05:00
Shawn O. Pearce	9d1b1b5ed7	Print the packfile names to stdout from fast-import. Caller scripts may want to know what packfiles the fast-import process just wrote out for them. This is now output to stdout, one packfile name per line, after we checkpoint each packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 08:05:01 -05:00
Shawn O. Pearce	d9ee53ce45	Implemented automatic checkpoints within fast-import. When the number of objects or number of bytes gets close to the limit allowed by the packfile format (or configured on the command line by our caller) we should automatically checkpoint the current packfile and start a new one before writing the object out. This does however require that we abandon the delta (if we had one) as its not valid in a new packfile. I also added the simple rule that if we got a delta back but the delta itself is the same size as or larger than the uncompressed object to ignore the delta and just store the object data. This should avoid some really bad behavior caused by our current delta strategy. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 08:00:49 -05:00
Shawn O. Pearce	2fce1f3c86	Optimize index creation on large object sets in fast-import. When we are generating multiple packfiles at once we only need to scan the blocks of object_entry structs which contain objects for the current packfile. Because the most recent blocks are at the front of the linked list, and because all new objects going into the current file are allocated from the front of that list, we can stop scanning for objects as soon as we identify one which doesn't belong to the current packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 07:12:23 -05:00
Shawn O. Pearce	3e005baf85	Don't create a final empty packfile in fast-import. If the last packfile is going to be empty (has 0 objects) then it shouldn't be kept after the import has terminated, as there is no point to the packfile. So rather than hashing it and making the index file, just delete the packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:39:39 -05:00
Shawn O. Pearce	7bfe6e2613	Implemented manual packfile switching in fast-import. To help importers which are dealing with massive amounts of data fast-import needs to be able to close the packfile it is currently writing to and open a new packfile for any additional data that will be received. A new 'checkpoint' command has been introduced which can be used by the frontend import process to force this to occur at any time. This may be useful to ensure a very long running import doesn't lose any work due to unexpected failures. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:35:41 -05:00
Shawn O. Pearce	80144727ac	Remove unnecessary duplicate_count in fast-import. There is little reason to be keeping a global duplicate_count value when we also keep it per object type. The global counter can easily be computed at the end, once all processing has completed. This saves us a couple of machine instructions in an unimportant part of code. But it looks slightly better to me to not keep two counters around. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:05:22 -05:00
Shawn O. Pearce	f70b653429	Restructure fast-import to support creating multiple packfiles. Now that we are starting to see some really large projects (such as KDE or a fork of FreeBSD) get imported into Git we're running into the upper limit on packfile object count as well as overall byte length. The KDE and FreeBSD projects are both likely to require more than 4 GiB to store their current history, which means we really need multiple packfiles to handle their content. This is a fairly simple restructuring of the internal code to help us support creating multiple packfiles from within fast-import. We are now adding a 5 digit incrementing suffix to the end of the basename supplied to us by the caller, permitting up to 99,999 packs to be generated in a single fast-import run. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 04:39:05 -05:00
Shawn O. Pearce	03842d8e24	Misc. type cleanups within fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 00:16:23 -05:00
Shawn O. Pearce	d489bc1491	Improve reuse of sha1_file library within fast-import. Now that the sha1_file.c library routines use the sliding mmap routines to perform efficient access to portions of a packfile I can remove that code from fast-import.c and just invoke it. One benefit is we now have reloading support for any packfile which uses OBJ_OFS_DELTA. Another is we have significantly less code to maintain. This code reuse change requires that fast-import generate only an OBJ_OFS_DELTA format packfile, as there is absolutely no index available to perform OBJ_REF_DELTA lookup in while unpacking an object. This is probably reasonable to require as the delta offsets result in smaller packfiles and are faster to unpack, as no index searching is required. Its also only a temporary requirement as users could always repack without offsets before making the import available to older versions of Git. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 22:33:51 -05:00
Shawn O. Pearce	1fcdd62adf	Merge branch 'master' into sp/fast-import I'm bringing master in early so that the OBJ_OFS_DELTA implementation is available as part of the topic. This way git-fast-import can learn about this new slightly smaller and faster packfile format, and can generate them directly rather than needing to have them be repacked with git-pack-objects. Due to the API changes in master during the period of development of git-fast-import, a few minor tweaks to fast-import.c are needed to produce a working merge. I've done them here as part of the merge to ensure bisection always works. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:44:18 -05:00
Shawn O. Pearce	9938ffc53a	Allow creating branches without committing in fast-import. Some importers may want to create a branch long before they actually commit to it, or in some cases they may never commit to the branch but they still need the ref to be created in the repository after the import is complete. This extends the 'reset ' command to automatically create a new branch if the supplied reference isn't already known as a branch. While I'm at it I also modified the syntax of the reset command to terminate with an empty line, like commit and tag operate. This just makes the command set more consistent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:12 -05:00
Shawn O. Pearce	62b6f48388	Support creation of merge commits in fast-import. Some importers are able to determine when branch merges occurred within their source data. In these cases they will want to supply the correct commits to fast-import so that a proper merge commit will exist in Git. This is now supported by supplying a 'merge ' command after the commit message and optional from command. A merge is not actually performed by fast-import, its assumed that the frontend performed any sort of merging activity already and that fast-import should simply be storing its result. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:12 -05:00
Shawn O. Pearce	cacbdd0afb	Fix repository corruption when using marks for modified blobs. Apparently we did not copy the blob SHA1 into the stack variable 'sha1' when a mark is used to refer to a prior blob. This code was not previously tested as the Mozilla CVS -> git-fast-import program always fed us full SHA1s for modified blobs and did not use the mark feature there. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	8a8c55ea70	Additional fast-import tree delta corruption cleanups. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	b54d6422b1	Correct tree corruption problems in fast-import. The new tree delta implementation caused blob SHA1s to be used instead of a tree SHA1 when a tree was written out. This really only appeared to happen when converting an existing file to a tree, but may have been possible in some other situations. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	23bc886c96	Replace ywrite in fast-import with the standard write_or_die. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	243f801d1d	Reuse the same buffer for all commits/tags in fast-import. Since most commits and tag objects are around the same size and we only generate one at a time we can reuse the same buffer rather than xmalloc'ing and free'ing the buffer every time we generate a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	e2eb469d1f	Recycle data buffers for tree generation in fast-import. We only ever generate at most two tree streams at a time. Since most trees are around the same size we can simply recycle the buffers from one tree generation to the next rather than constantly xmalloc'ing and free'ing them. This should perform slightly better when handling a large number of trees as malloc has less work to do. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	4cabf8583f	Implemented tree delta compression in fast-import. We now store for every tree entry two modes and two sha1 values; the base (aka "version 0") and the current/new (aka "version 1"). When we generate a tree object we also regenerate the prior version object and use that as our base object for a delta. This strategy saves a significant amount of memory as we can continue to use the atom pool for file/directory names and only increases each tree entry by an additional 24 bytes of memory. Branches should automatically delta against their ancestor tree, unless the ancestor tree is already at the delta chain limit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	445b85999a	Converted hash memcpy/memcmp to new hashcpy/hashcmp/hashclr. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	08d7e892a7	Don't crash fast-import if no branch log was requested. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	5fced8dc6f	Added 'reset' command to clear a branch's tree. Sometimes an import frontend may need to work with a temporary branch which will actually contain many different branches over the life of the import. This is especially useful when the frontend needs to create a tag from a set of file versions which are otherwise never a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	53dbce78a2	Map only part of the generated pack file at any point in time. When generating a very large pack file (for example close to 1 GB in size) it may be impossible for the kernel to find a contiguous free range within a 32 bit address space for the mapping to be located at. This is especially problematic on large imports where there is a lot of malloc activity occuring within the same process and the malloc'd regions may straddle the previously mapped regions, thereby creating large holes in the address space. So instead we map only 128 MB of the pack at any given time. This will likely increase the number of times the file gets mapped (with additional system time required to update the page tables more frequently) but will allow the program to handle packs up to 4 GB in size. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	35ef237cf6	Fixed compile error in fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	2eb26d8454	Fixed GPF in fast-import caused by unterminated linked list. fast-import was encounting a GPF when it ran out of free tree_entry objects but didn't know this was the cause because the last tree_entry wasn't terminated with a NULL pointer. The missing NULL pointer occurred when we allocated additional entries via xmalloc but didn't set the last tree_entry's "next" pointer to NULL. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	264244a042	Added --branch-log to option to fast-import. This option can be used to have a record of every commit, the mark (if supplied) and branch name of the commit recorded into a log file when the commit is generated. This log can be useful to verify the results of an import as the commits can be compared to some source repository matching commits through the mark value. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	a6a1a831d9	Added option to export the marks table when fast-import terminates. The marks table can be used by the frontend to load any commit after the import and compare it to whatever data the frontend knows about that commit. If the mark idnums can be easily correlated to some reference source then its relatively trivial to compare the GIT tree to the reference to verify the accuracy of the import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	8435a9cb26	Account for tree entry memory costs in fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	02f3389d96	Moved from command to after data to help cvs2svn. cvs2svn has three phases: begin_commit, middle_commit, end_commit. The ancester is computed in the middle_commit phase. So its easier to generate a stream if the from command appears after the commit message itself but before the file change commands. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	00e2b8842c	Remove branch creation command from fast-import. Jon Smirl was finding it difficult to alter cvs2svn to generate branch commands prior to the first commit of the same branch. This change moves the 'from' command to be an optional parameter of the 'commit' command, thereby allowing a new branch to be defined at the moment it gets used to create the first commit on that branch. This change makes it impossible to create a branch with no commits on it as at least one commit is needed to register the branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	8d8928b051	Round out memory pool allocations in fast-import to pointer sizes. Some architectures (e.g. SPARC) would require that we access pointers only on pointer-sized alignments. So ensure the pool allocator rounds out non-pointer sized allocations to the next pointer so we don't generate bad memory addresses. This could have occurred if we had previously allocated an atom whose string was not a whole multiple of the pointer size, for example. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	41e5257fcf	Implemented tree reloading in fast-import. Tree reloading allows fast-import to swap out the least-recently used branch by simply deallocating the data structures from memory that were associated with that branch. Later if the branch becomes active again it can lazily recreate those structures on demand by reloading the necessary trees from the pack file it originally wrote them to. The reloading process is implemented by mmap'ing the pack into memory and using a much tighter variant of the pack reading code contained in sha1_file.c. This was a blatent copy from sha1_file.c but the unpacking functions were significantly simplified and are actually now in a form that should make it easier to map only the necessary regions of a pack rather than the entire file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	72303d44e9	Implemented 'tag' command in fast-import. Tags received from the frontend are generated in memory in a simple linked list in the order that the tag commands were sent by the frontend. If multiple different tag objects for the same tag name get generated the last one sent by the frontend will be the one that gets written out at termination. Multiple tag objects for the same name will cause all older tags of the same name to be lost. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	d6c7eb2c16	Added branch load counter to fast-import. If the branch load count exceeds the number of branches created then the frontend is causing fast-import to page branches into and out of memory due to the way its ordering its commits. Performance can likely be increased if the frontend were to alter its commit sequence such that it stays on one branch before switching to another branch, then never returns to the prior branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	d83971688b	Added mark store/find to fast-import. Marks are now saved when the mark directive gets used by the frontend and may be used in place of a SHA1 expression to locate a previous SHA1 which fast-import may have generated. This is particularly useful with commits where the frontend does not (easily) have the ability to compute the SHA1 for an arbitrary commit but needs it to generate a branch or tag from that commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	d5c57b284e	Converted fast-import to accept standard command line parameters. The following command line options are now accepted before the pack name: --objects=n # replaces the object count after the pack name --depth=n # delta chain depth to use (default is 10) --active-branches=n # maximum number of branches to keep in memory Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	afde8dd96d	Fixed segfault in fast-import after growing a tree. Growing a tree caused all subtrees to be deallocated and put back into the free list yet those subtree's contents were still actively in use. Consequently they were doled out again and got stomped on elsewhere. Releasing a tree is now performed in two parts, either releasing only the content array or releasing the content array and recursively releasing the subtree(s). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	ace4a9d1ae	Allow symlink blobs in trees during fast-import. If a frontend is smart enough to import a symlink then we should let them do so. We'll assume that they were smart enough to first generate a blob to hold the link target, as that's how symlinks get represented in GIT. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	c90be46abd	Changed fast-import's pack header creation to use pack.h Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	c44cdc7eef	Converted fast-import to a text based protocol. Frontend clients can now send a text stream to fast-import rather than a binary stream. This should facilitate developing frontend software as the data stream is easier to view, manipulate and debug my hand and Mark-I eyeball. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	7111feede9	Implement blob ID validation in fast-import. When accepting revision SHA1 IDs from the frontend verify the SHA1 actually refers to a blob and is known to exist. Its an error to use a SHA1 in a tree if the blob doesn't exist as this would cause git-fsck-objects to report a missing blob should the pack get closed without the blob being appended into it or a subsequent pack. So right now we'll just ask that the frontend "pre-declare" any blobs it wants to use in a tree before it can use them. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	463acbe1c6	Added tree and commit writing to fast-import. The tree of the current commit can be altered by file_change commands before the commit gets written to the pack. The file changes are rather primitive as they simply allow removal of a tree entry or setting/adding a tree entry. Currently trees and commits aren't being deltafied when written to the pack and branch reloading from the current pack doesn't work, so at most 5 branches can be worked with at any one time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	6bb5b3291d	Implemented branch handling and basic tree support in fast-import. This provides the basic data structures needed to store trees in memory while we are processing them for a branch. What we are attempting to do is track one complete tree for each branch that the frontend has registered with us through the 'newb' (new_branch) command. When the frontend edits that tree through 'updf' or 'delf' commands we'll mark the affected tree(s) as being dirty and recompute their objects during 'comt' (commit). Currently the protocol is decidedly _not_ user friendly. I crashed fast-import by giving it bad input data from Perl. I may try to improve upon it, or at least upon its error handling. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	6143f0644e	Added basic command handler to fast-import. Moved the new_blob logic off into a new subroutine and invoked it when getting the 'blob' command. Added statistics dump to STDERR when the program terminates listing what it did at a high level. This is somewhat interesting. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	ac47a738a7	Refactored fast-import's internals for future additions. Too many globals variables were being used not not enough code was resuable to process trees and commits so this is a simple refactoring of the existing blob processing code to get into a state that will be easier to handle trees and commits in. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:02 -05:00
Shawn O. Pearce	27d6d29035	Cleaned up memory allocation for object_entry structs. Although its easy to ask the user to tell us how many objects they will need, its probably better to dynamically grow the object table in large units. But if the user can give us a hint as to roughly how many objects then we can still use it during startup. Also stopped printing the SHA1 strings to stdout as no user is currently making use of that facility. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:02 -05:00
Shawn O. Pearce	8bcce30126	Added automatic index generation to fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:01 -05:00
Shawn O. Pearce	db5e523fdd	Created fast-import, a tool to quickly generating a pack from blobs. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:01 -05:00

... 3 4 5 6 7