These were not caught by the previous commit, as they did not match the
regular expression.
While at it, remove the localization from one instance: we never want
BUG() messages to be translated, as they target Git developers, not the
end user (hence it would be quite unhelpful to not only burden the
translators, but then even end up with a bug report in a language that
no core Git contributor understands).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These used to be for manipulating the in-memory repo_tree structure,
but nowadays they are convenience wrappers to handle a few git-vs-svn
mismatches:
1. Git does not track empty directories but Subversion does. When
looking up a path in git that Subversion thinks exists and finding
nothing, we can safely assume that the path represents a
directory. This is needed when a later Subversion revision
modifies that directory.
2. Subversion allows deleting a file by copying. In Git fast-import
we have to handle that more explicitly as a deletion.
These are details of the tool's interaction with git fast-import.
Move them to fast_export.c, where other such details are handled.
This way the function names do not start with a repo_ prefix that
would clash with the repository object introduced in
v2.14.0-rc0~38^2~16 (repository: introduce the repository object,
2017-06-22) or an svn_ prefix that would clash with libsvn (in case
someone wants to link this code with libsvn some day).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the rest of Git, these modes are spelled as S_IFDIR,
S_IFREG | 0644, S_IFREG | 0755, and S_IFLNK. Use the same constants
in svn-fe for simplicity and consistency.
No functional change intended.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).
So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.
By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.
As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.
There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.
So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two instances of %ld being used for unsigned longs
Signed-off-by: Mike Ralphson <mike.ralphson@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prefixcmp() and suffixcmp() share the common "cmp" suffix that
typically are used to name functions that can be used for ordering,
but they can't, because they are not antisymmetric:
prefixcmp("foo", "foobar") < 0
prefixcmp("foobar", "foo") == 0
We in fact do not use these functions for ordering. Replace them
with functions that just check for equality.
Add starts_with() and end_with() that will be used to replace
prefixcmp() and suffixcmp(), respectively, as the first step. These
are named after corresponding functions/methods in programming
languages, like Java, Python and Ruby.
In vcs-svn/fast_export.c, there was already an ends_with() function
that did the same thing. Let's use the new one instead while at it.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Search for a note attached to the ref to update and read it's
'Revision-number:'-line. Start import from the next svn revision.
If there is no next revision in the svn repo, svnrdump terminates with
a message on stderr an non-zero return value. This looks a little
weird, but there is no other way to know whether there is a new
revision in the svn repo.
On the start of an incremental import, the parent of the first commit
in the fast-import stream is set to the branch name to update. All
following commits specify their parent by a mark number. Previous mark
files are currently not reused.
Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To provide metadata from svn dumps for further processing, e.g.
branch detection, attach a note to each imported commit that stores
additional information. The notes are currently hard-coded in
refs/notes/svn/revs. Currently the following lines from the svn dump
are directly accumulated in the note. This can be refined as needed.
- "Revision-number"
- "Node-path"
- "Node-kind"
- "Node-action"
- "Node-copyfrom-path"
- "Node-copyfrom-rev"
Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast_export lacked a method to writes notes to fast-import stream.
Add two new functions fast_export_note which is similar to
fast_export_modify. And also add fast_export_buf_to_data to be able to
write inline blobs that don't come from a line_buffer or from delta
application.
To be used like this:
fast_export_begin_commit("refs/notes/somenotes", ...)
fast_export_note("refs/heads/master", "inline")
fast_export_buf_to_data(&data)
or maybe
fast_export_note("refs/heads/master", sha1)
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference to update by the fast-import stream is hard-coded. When
fetching from a remote the remote-helper shall update refs in a
private namespace, i.e. a private subdir of refs/. This namespace is
defined by the 'refspec' capability, that the remote-helper advertises
as a reply to the 'capabilities' command.
Extend svndump and fast-export to allow passing the target ref.
Update svn-fe to be compatible.
Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are already safe because both sides of the comparison are
nonnegative.
This would normally not be important because Git is not -Wsign-compare
clean anyway, but we like to keep the vcs-svn/ lib to a higher
standard for convenience using it in other projects.
Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
memmem is a GNU extension.
Avoiding it makes the code clearer and makes it easier for projects
that don't share git's compat/ code, such as the standalone
svn-dump-fast-export project, to reuse the vcs-svn/ library.
Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Since v1.7.5~42^2~6 (vcs-svn: remove buffer_read_string)
buffer_reset() does nothing thus fast_export_reset() also.
Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
There is no reason in principle that an svn-format dump would not
be able to represent a file whose length does not fit in a 32-bit
integer. Use off_t consistently (instead of uint32_t) to represent
file lengths so we can handle that.
Most of our code is already ready to do that without this patch and
already passes values of type off_t around. The type mismatch due to
stragglers was noticed with gcc -Wtype-limits.
Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
first_commit_done has zero as a default value, but it
is not reset back to zero in fast_export_init.
Reset it back to zero so that each export will have
proper initial state.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
When importing from a dump with deltas, first fast_export_init calls
buffer_fdinit, and then init_report_buffer calls fdopen once again
when processing the first delta. The second initialization is
redundant and leaks a FILE *.
Remove the redundant on-demand initialization to fix this.
Initializing directly in fast_export_init is simpler and lets the
caller pass an int specifying which fd to use instead of hard-coding
REPORT_FILENO.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
A corrupt Subversion-format delta can request reads past the end of
the preimage. Set sliding_view::max_off so such corruption is caught
when it appears rather than blocking in an impossible-to-fulfill
read() when input is coming from a socket or pipe.
Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed integer overflow produces undefined behavior in C and off_t is
a signed type. For predictable behavior, add some checks to protect
in advance against overflow.
On 32-bit systems ftell as called by buffer_tmpfile_prepare_to_read
is likely to fail with EOVERFLOW when reading the corresponding
postimage, and this patch does not fix that. So it's more of a
futureproofing measure than a complete fix.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Handle input in Subversion's dumpfile format, version 3. This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.
To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.
The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.
The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.
NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
* db/svn-fe-code-purge:
vcs-svn: drop obj_pool
vcs-svn: drop treap
vcs-svn: drop string_pool
vcs-svn: pass paths through to fast-import
Conflicts:
vcs-svn/fast_export.c
vcs-svn/fast_export.h
vcs-svn/repo_tree.c
vcs-svn/repo_tree.h
vcs-svn/string_pool.c
vcs-svn/svndump.c
vcs-svn/trp.txt
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI. Think of
it as growing pains. This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.
* db/vcs-svn-incremental:
vcs-svn: avoid using ls command twice
vcs-svn: use mark from previous import for parent commit
vcs-svn: handle filenames with dq correctly
vcs-svn: quote paths correctly for ls command
vcs-svn: eliminate repo_tree structure
vcs-svn: add a comment before each commit
vcs-svn: save marks for imported commits
vcs-svn: use higher mark numbers for blobs
vcs-svn: set up channel to read fast-import cat-blob response
Conflicts:
t/t9010-svn-fe.sh
vcs-svn/fast_export.c
vcs-svn/fast_export.h
vcs-svn/repo_tree.c
vcs-svn/svndump.c
gcc -m32 correctly warns:
vcs-svn/fast_export.c: In function 'fast_export_commit':
vcs-svn/fast_export.c:54:2: warning: format '%llu' expects
argument of type 'long long unsigned int', but argument 2
has type 'unsigned int' [-Wformat]
Fix it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.
Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.
While at it, stop including system headers from repo_tree.h. git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Now that there is no internal representation of the repo, it is not
necessary to tokenise paths. Use strbuf instead and bypass
string_pool.
This means svn-fe can handle arbitrarily long paths (as long as a
strbuf can fit them), with arbitrarily many path components.
While at it, since we now treat paths in their entirety, only quote
when necessary.
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
* db/strbufs-for-metadata:
vcs-svn: use strbuf for author, UUID, and URL
vcs-svn: use strbuf for revision log
Conflicts:
vcs-svn/fast_export.c
vcs-svn/fast_export.h
vcs-svn/repo_tree.c
vcs-svn/svndump.c
Use strbufs and strings instead of interned strings for values of rev,
dump, and node fields that happen to be strings. After this change,
the only remaining string_pool use is for paths in the repo_tree API
and internals.
Functional change: treat an empty author, UUID, or URL as none at all.
So for example, in repos where the first revision has an empty
svn:author property, the first rev will be treated as by "nobody"
rather than by a person with empty name and email address created by
prepending an @ sign to the repository UUID.
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Catch input errors and exit early enough to print a reasonable
diagnosis based on errno.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
With this patch, overlapping incremental imports work.
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Quote paths passed to fast-import so filenames with double quotes are
not misinterpreted.
One might imagine this could help with filenames with newlines, too,
but svn does not allow those.
Helped-by: David Barr <daivd.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
This bug was found while importing rev 601865 of ASF.
[jn: with test]
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Rely on fast-import for information about previous revs.
This requires always setting up backward flow of information, even for
v2 dumps. On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.
[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
portability to other backends]
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Current svn-fe produces output like this:
blob
mark :7382321
data 5
hello
blob
mark :7382322
data 5
Hello
commit
mark :3
[...]
M 100644 :7382321 hello.c
M 100644 :7382322 hello2.c
This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it. A better
strategy would be to use inline blobs:
commit
mark :3
[...]
M 100644 inline hello.c
data 5
hello
[...]
As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
This way, a person can use
svnadmin dump $path |
svn-fe |
git fast-import --relative-marks --export-marks=svn-revs
to get a list of what commit corresponds to each svn revision (plus
some irrelevant blob names) in .git/info/fast-import/svn-revs.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.
Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:
$ svndump <path> | svn-fe
fatal: cannot read from file descriptor 3: Bad file descriptor
For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long. A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.
[jn: with new change description]
Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.
svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.
The tests don't take advantage of the new features, but I think that's
okay. It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:
CC vcs-svn/fast_export.o
vcs-svn/fast_export.c: In function `fast_export_modify':
vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2)
vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3)
vcs-svn/fast_export.c: In function `fast_export_commit':
vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5)
vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2)
vcs-svn/fast_export.c: In function `fast_export_blob':
vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2)
vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3)
CC vcs-svn/svndump.o
vcs-svn/svndump.c: In function `svndump_read':
vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3)
In order to suppress the warnings we use the C99 format specifier
macros PRIo32 and PRIu32 from <inttypes.h>.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in
merge-recursive, 2009-05-23), use a 32-bit integer instead; the
dump file parser does not support any better, anyway.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.
The exported functions roughly correspond to Subversion FS operations.
. repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
update the current commit, based roughly on the corresponding
Subversion FS operation.
. repo_commit calls out to fast_export to write the current commit to
the fast-import stream in stdout.
. repo_diff is used by the fast_export module to write the changes
for a commit.
. repo_reset erases the exporter's state, so valgrind can be happy.
[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>