1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-18 15:04:49 +01:00
Commit graph

113 commits

Author SHA1 Message Date
David Barr
044ad2906a vcs-svn: implement perfect hash for node-prop keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

This is a small step towards removing dependence on string_pool.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:02 -05:00
David Barr
7c5817d3ba vcs-svn: use strbuf for author, UUID, and URL
Use strbufs and strings instead of interned strings for values of rev,
dump, and node fields that happen to be strings.  After this change,
the only remaining string_pool use is for paths in the repo_tree API
and internals.

Functional change: treat an empty author, UUID, or URL as none at all.
So for example, in repos where the first revision has an empty
svn:author property, the first rev will be treated as by "nobody"
rather than by a person with empty name and email address created by
prepending an @ sign to the repository UUID.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:01:48 -05:00
David Barr
dce33c9c18 vcs-svn: use strbuf for revision log
obj_pool is overkill for this application: all that is needed is a
buffer that can resize from rev to rev to accomodate differently-sized
strings.  In the spirit of commit deadcef4 (2010-11-06), use a strbuf
instead.

This is a small step towards removing dependence on obj_pool.h.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:36 -05:00
Jonathan Nieder
c9d1c8ba05 vcs-svn: improve reporting of input errors
Catch input errors and exit early enough to print a reasonable
diagnosis based on errno.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:09 -05:00
Jonathan Nieder
26557fc1b3 vcs-svn: make buffer_copy_bytes return length read
Currently buffer_copy_bytes does not report to its caller whether
it encountered an early end of file.

Add a return value representing the number of bytes read (but not
the number of bytes copied).  This way all three unusual conditions
can be distinguished: input error with buffer_ferror, output error
with ferror(outfile), early end of input by checking the return
value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:40:26 -05:00
Jonathan Nieder
d234f54b2f vcs-svn: make buffer_skip_bytes return length read
Currently there is no way to detect when input ended if it ended
early during buffer_skip_bytes.  Tell the calling program how many
bytes were actually skipped for easier debugging.

Existing callers will still ignore early EOF.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:39:54 -05:00
Jonathan Nieder
93b709c79e vcs-svn: improve support for reading large files
Move from uint32_t to off_t as the fundamental unit of length used by
the line_buffer library.  Performance would get worse if anything but
I think it's worth it for support of deltas that need to skip large
pieces (> 4 GiB).

Exception: buffer_read_string still takes a uint32_t, since it keeps
its result in an in-core obj_pool.

Callers still have to be updated to take advantage of this.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:32:09 -05:00
Jonathan Nieder
06316234ac vcs-svn: remove spurious semicolons
trp_gen is not a statement or function call, so it should not be
followed with a semicolon.  Noticed by gcc -pedantic.

 vcs-svn/repo_tree.c:41:81: warning: ISO C does not allow extra ';'
  outside of a function [-pedantic]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:56:23 -07:00
David Barr
dd3f42ad79 vcs-svn: use mark from previous import for parent commit
With this patch, overlapping incremental imports work.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
1ae469b06c vcs-svn: handle filenames with dq correctly
Quote paths passed to fast-import so filenames with double quotes are
not misinterpreted.

One might imagine this could help with filenames with newlines, too,
but svn does not allow those.

Helped-by: David Barr <daivd.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
David Barr
e435811208 vcs-svn: quote paths correctly for ls command
This bug was found while importing rev 601865 of ASF.

[jn: with test]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
723b7a2789 vcs-svn: eliminate repo_tree structure
Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information, even for
v2 dumps.  On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.

[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
 portability to other backends]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
7e11902c99 vcs-svn: add a comment before each commit
Current svn-fe produces output like this:

	blob
	mark :7382321
	data 5
	hello

	blob
	mark :7382322
	data 5
	Hello

	commit
	mark :3
[...]
	M 100644 :7382321 hello.c
	M 100644 :7382322 hello2.c

This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it.  A better
strategy would be to use inline blobs:

	commit
	mark :3
[...]
	M 100644 inline hello.c
	data 5
	hello
[...]

As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
78e1a3ff23 vcs-svn: save marks for imported commits
This way, a person can use

	svnadmin dump $path |
	svn-fe |
	git fast-import --relative-marks --export-marks=svn-revs

to get a list of what commit corresponds to each svn revision (plus
some irrelevant blob names) in .git/info/fast-import/svn-revs.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
d38f84484f vcs-svn: use higher mark numbers for blobs
Prepare to use mark :5 for the commit corresponding to r5 (and so on).

1 billion seems sufficiently high for blob marks to avoid conflicting
with rev marks, while still leaving room for 3 billion blobs.  Such
high mark numbers cause trouble with ancient fast-import versions, but
this topic cannot support git fast-import versions before 1.7.4 (which
introduces the cat-blob command) anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
David Barr
41529bbce4 vcs-svn: set up channel to read fast-import cat-blob response
Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.

Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:

	$ svndump <path> | svn-fe
	fatal: cannot read from file descriptor 3: Bad file descriptor

For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long.  A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.

[jn: with new change description]

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
efc749b48f vcs-svn: allow input errors to be detected promptly
The line_buffer library silently flags input errors until
buffer_deinit time; unfortunately, by that point usually errno is
invalid.  Expose the error flag so callers can check for and
report errors early for easy debugging.

	some_error_prone_operation(...);
	if (buffer_ferror(buf))
		return error("input error: %s", strerror(errno));

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:32:51 -06:00
Jonathan Nieder
e75316de53 vcs-svn: simplify repo_modify_path and repo_copy
Restrict the repo_tree API to functions that are actually needed.

 - decouple reading the mode and content of dirents from other
   operations.
 - remove repo_modify_path.  It is only used to read the mode from
   dirents.
 - remove the ability to use repo_read_mode on a missing path.  The
   existing code only errors out in that case, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
5a38b186d3 vcs-svn: handle_node: use repo_read_path
svn-fe processes each commit in two stages: first decide on the
correct content for all paths and export the relevant blobs, then
export a commit with the result.

But we can keep less state and simplify svn-fe a great deal by
exporting the commit in one step: use 'inline' blobs for each path and
remember nothing.  This way, the repo_tree structure could be
eliminated, and we would get support for incremental imports 'for
free'.

Reorganize handle_node along these lines.  This is just a code
cleanup; the changes in repo_tree and handle_revision will come later.

[db: backported to apply without text delta support]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
4f5de755a7 vcs-svn: introduce repo_read_path to check the content at a path
The repo_tree structure remembers, for each path in each revision, a
mode (regular file, executable, symlink, or directory) and content
(blob mark or directory structure).  Maintaining a second copy of all
this information when it's already in the target repository is
wasteful, it does not persist between svn-fe invocations, and most
importantly, there is no convenient way to transfer it from one
machine to another.  So it would be nice to get rid of it.

As a first step, let's change the repo_tree API to match fast-import's
read commands more closely.  Currently to read the mode for a path,
one uses

	repo_modify_path(path, new_mode, new_content);

which changes the mode and content as a side effect.  There is no
function to read the content at a path; add one.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
a62bbf8f01 Merge commit 'jn/svn-fe' of git://github.com/gitster/git into svn-fe
* git://github.com/gitster/git:
  vcs-svn: Allow change nodes for root of tree (/)
  vcs-svn: Implement Prop-delta handling
  vcs-svn: Sharpen parsing of property lines
  vcs-svn: Split off function for handling of individual properties
  vcs-svn: Make source easier to read on small screens
  vcs-svn: More dump format sanity checks
  vcs-svn: Reject path nodes without Node-action
  vcs-svn: Delay read of per-path properties
  vcs-svn: Combine repo_replace and repo_modify functions
  vcs-svn: Replace = Delete + Add
  vcs-svn: handle_node: Handle deletion case early
  vcs-svn: Use mark to indicate nodes with included text
  vcs-svn: Unclutter handle_node by introducing have_props var
  vcs-svn: Eliminate node_ctx.mark global
  vcs-svn: Eliminate node_ctx.srcRev global
  vcs-svn: Check for errors from open()
  vcs-svn: Allow simple v3 dumps (no deltas yet)

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/svndump.c
2011-02-26 05:21:29 -06:00
Jonathan Nieder
b1c9b798a6 vcs-svn: teach line_buffer about temporary files
It can sometimes be useful to write information temporarily to file,
to read back later.  These functions allow a program to use the
line_buffer facilities when doing so.

It works like this:

 1. find a unique filename with buffer_tmpfile_init.
 2. rewind with buffer_tmpfile_rewind.  This returns a stdio
    handle for writing.
 3. when finished writing, declare so with
    buffer_tmpfile_prepare_to_read.  The return value indicates
    how many bytes were written.
 4. read whatever portion of the file is needed.
 5. if finished, remove the temporary file with buffer_deinit.
    otherwise, go back to step 2,

The svn support would use this to buffer the postimage from delta
application until the length is known and fast-import can receive
the resulting blob.

Based-on-patch-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
cb3f87cf1b vcs-svn: allow input from file descriptor
Based-on-patch-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
cc193f1f0b vcs-svn: allow character-oriented input
buffer_read_char can be used in place of buffer_read_string(1) to
avoid consuming valuable static buffer space.  The delta applier will
use this to read variable-length integers one byte at a time.

Underneath, it is fgetc, wrapped so the line_buffer library can
maintain its role as gatekeeper of input.

Later it might be worth checking if fgetc_unlocked is faster ---
most line_buffer functions are not thread-safe anyway.

Helpd-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
e832f43c1d vcs-svn: add binary-safe read function
buffer_read_string works well for non line-oriented input except for
one problem: it does not tell the caller how many bytes were actually
written.  This means that unless one is very careful about checking
for errors (and eof) the calling program cannot tell the difference
between the string "foo" followed by an early end of file and the
string "foo\0bar\0baz".

So introduce a variant that reports the length, too, a thinner wrapper
around strbuf_fread.  Its result is written to a strbuf so the caller
does not need to keep track of the number of bytes read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
e5e45ca1e3 vcs-svn: teach line_buffer to handle multiple input files
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.

The tests don't take advantage of the new features, but I think that's
okay.  It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:59 -06:00
Jonathan Nieder
d350822fa7 vcs-svn: collect line_buffer data in a struct
Prepare for the line_buffer lib to support input from multiple files,
by collecting global state in a struct that can be easily passed
around.

No API change yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder
deadcef4c1 vcs-svn: replace buffer_read_string memory pool with a strbuf
obj_pool is inherently global and does not use the standard growing
factor alloc_nr, which makes it feel out of place in the git codebase.
Plus it is overkill for this application: all that is needed is a
buffer that can grow between requests to accomodate larger strings.
Use a strbuf instead.

As a side effect, this improves the error handling: allocation
failures will result in a clean exit instead of segfaults.  It would
be nice to add a test case (using ulimit or failmalloc) but that can
wait for another day.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder
4d21bec0d2 vcs-svn: eliminate global byte_buffer
The data stored in byte_buffer[] is always either discarded or
written to stdout immediately.  No need for it to persist between
function calls.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Ramsay Jones
5ee5f5a65d svndump.c: Fix a printf format compiler warning
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

        CC vcs-svn/svndump.o
    vcs-svn/svndump.c: In function `svndump_read':
    vcs-svn/svndump.c:215: warning: int format, uint32_t arg (arg 2)

In order to suppress the warning we use the C99 format specifier
macro PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 16:48:47 -08:00
Junio C Hamano
bf9b46c16d Merge branch 'jn/svn-fe' (early part)
* 'jn/svn-fe' (early part):
  vcs-svn: Error out for v3 dumps

Conflicts:
	t/t9010-svn-fe.sh
2011-01-05 13:34:43 -08:00
Junio C Hamano
b720c75afd Merge branch 'jn/maint-svn-fe'
* jn/maint-svn-fe:
  t9010 fails when no svn is available
  vcs-svn: fix intermittent repo_tree corruption
  treap: make treap_insert return inserted node
  t9010 (svn-fe): Eliminate dependency on svn perl bindings
2010-12-16 12:49:35 -08:00
Jonathan Nieder
9e8c532108 vcs-svn: Allow change nodes for root of tree (/)
It is not uncommon for a svn repository to include change records for
properties at the top level of the tracked tree:

	Node-path:
	Node-kind: dir
	Node-action: change
	Prop-delta: true
	Prop-content-length: 43
	Content-length: 43

	K 10
	svn:ignore
	V 11
	build-area

	PROPS-END

Unfortunately a recent svn-fe change (vcs-svn: More dump format sanity
checks, 2010-11-19) causes such nodes to be rejected with the error
message

	fatal: invalid dump: path to be modified is missing

The repo_tree module does not keep a dirent for the root of the tree.
Add a block to the dump parser to take care of this case.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:04:56 -08:00
Jonathan Nieder
3c93983875 vcs-svn: fix intermittent repo_tree corruption
Pointers to directory entries do not remain valid after a call to
dent_insert.

Noticed in the course of importing a small Subversion repository
(~1000 revs); after setting up a dirent for a certain path as a
placeholder, by luck dent_insert would trigger a realloc that
shifted around addresses, resulting in an import with that file
replaced by a directory.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:04:02 -08:00
Jonathan Nieder
97a5e3453a treap: make treap_insert return inserted node
Suppose I try the following:

	struct int_node *node = node_pointer(node_alloc(1));
	node->n = 5;
	treap_insert(&root, node);
	printf("%d\n", node->n);

Usually the result will be 5.  But since treap_insert draws memory
from the node pool, if the caller is unlucky then (1) the node pool
will be full and (2) realloc will be forced to move the node pool to
find room, so the node address becomes invalid and the result of
dereferencing it is undefined.

So we ought to use offsets in preference to pointers for references
that would remain valid after a call to treap_insert.  Tweak the
signature to hint at a certain special case: since the inserted node
can change address (though not offset), as a convenience teach
treap_insert to return its new address.

So the motivational example could be fixed by adding "node =".

	struct int_node *node = node_pointer(node_alloc(1));
	node->n = 5;
	node = treap_insert(&root, node);
	printf("%d\n", node->n);

Based on a true story.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07 16:03:55 -08:00
David Barr
6b01b67658 vcs-svn: Implement Prop-delta handling
The rules for what file is used as delta source for each file are not
documented in dump-load-format.txt.  Luckily, the Apache Software
Foundation repository has rich enough examples to figure out most of
the rules:

Node-action: replace implies the empty property set and empty text as
preimage for deltas.  Otherwise, if a copyfrom source is given, that
node is the preimage for deltas.  Lastly, if none of the above applies
and the node path exists in the current revision, then that version
forms the basis.

[jn: refactored, with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:59 -08:00
Jonathan Nieder
6263c06d49 vcs-svn: Sharpen parsing of property lines
Prepare to add a new type of property line (the 'D' line) to
handle property deltas.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder
2a48afe1c2 vcs-svn: Split off function for handling of individual properties
The handle_property function is the part of read_props that would be
interesting for most people: semantics of properties rather than the
algorithm for parsing them.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder
3f3e676d6e vcs-svn: Make source easier to read on small screens
Remove some newlines from handle_node() that are not needed for
clarity.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:53:58 -08:00
Jonathan Nieder
c7dbf35e91 vcs-svn: More dump format sanity checks
Node-action: change is not appropriate when switching between file and
directory or adding a new file.  Current svn-fe silently accepts such
nodes and the resulting tree has missing files in the "changed when
meant to add" case.

Node-action: add requires some content (text or directory); there is
no such thing as an "intent to add" node in svn dumps.  Current svn-fe
accepts such contentless adds but produces an invalid fast-import
stream that refers to nonexistent mark :0 in response.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:52:51 -08:00
Jonathan Nieder
414e569e45 vcs-svn: Reject path nodes without Node-action
It would be better to flag such errors and let the import proceed
anyway, but for now it is simpler not to worry about recovery
from such weird cases.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:52:47 -08:00
Jonathan Nieder
1c7bb31616 vcs-svn: Delay read of per-path properties
The mode for each file in an svn-format dump is kept in the properties
section.  The properties section is read as soon as possible to allow
the correct mode to be filled in when registering the file with the
repo_tree lib.

To support nodes with a missing properties section, svn-fe determines
the mode in three stages:

 - The kind (directory or file) of the node is read from the dump and
   used to make an initial estimate (040000 or 100644).
 - Properties are read in and allowed to override this for symlinks
   and executables.
 - If there is no properties section, the mode from the previous
   content of the path is left alone, overriding the above
   considerations.

This is a bit of a mess, and worse, it would get even more complicated
once we start to support property deltas.  If we could only register
the file with a provisional value for mode and then change it later
when properties say so, the procedure would be much simpler.

... oh, right, we can.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder
08c39b5c44 vcs-svn: Combine repo_replace and repo_modify functions
There are two functions to change the staged content for a path in the
svn importer's active commit: repo_replace, which changes the text and
returns the mode, and repo_modify, which changes the text and mode and
returns nothing.

Worse, there are more subtle differences:

 - A mark of 0 passed to repo_modify means "use the existing content".
   repo_replace uses it as mark :0 and produces a corrupt stream.

 - When passed a path that is not part of the active commit,
   repo_replace returns without doing anything.  repo_modify
   transparently adds a new directory entry.

Get rid of both and introduce a new function with the best features of
both: repo_modify_path modifies the mode, content, or both for a path,
depending on which arguments are zero.  If no such dirent already
exists, it does nothing and reports the error by returning 0.
Otherwise, the return value is the resulting mode.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder
6ee4a9be48 vcs-svn: Replace = Delete + Add
Simplify by reducing the "Node-action: replace" case to "Node-action:
add".  This way, the main part of handle_node() only has to deal with
"add" and "change" nodes.

Functional change: replacing a symlink or executable without setting
properties will reset the mode.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder
5af8fae2df vcs-svn: handle_node: Handle deletion case early
Take care of "Node-action: delete" as soon as possible, so we can stop
worrying about that case in the rest of the function.

Functional change: catch deletion nodes with features that would not
apply to them (text, properties, or origin data) and error out for
those cases.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder
462e1f51a5 vcs-svn: Use mark to indicate nodes with included text
Allocate a mark if needed as soon as possible so later code can use
"if (mark)" to check if this node has text attached rather than
explicitly checking for Text-content-length.

While at it, reject directory nodes with text attached; the presence
of such a node would indicate a bug in the dump generator or svn-fe's
understanding.  In the long term, it would be nice to be able to
continue parsing and save the error for later, but for now it is
simpler to error out right away.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:43 -08:00
Jonathan Nieder
d6e81a0315 vcs-svn: Unclutter handle_node by introducing have_props var
It is possible for a path node in an SVN-format dump file to leave out
the properties section.  svn-fe handles this by carrying over the
properties (in particular, file type) from the old version of that
node.

To support this, handle_node tests several times whether a
Prop-content-length field is present.  Ancient Subversion actually
leaves out the Prop-content-length field even for nodes with
properties, so that's not quite the right check.  Besides, this detail
of mechanism is distracting when the question at hand is instead what
content the new node should have.

So introduce a local have_props variable.  The semantics are the same
as before; the adaptations to support ancient streams that leave out
the prop-content-length can wait until someone needs them.

Signed-off-by: Jonathan Nieder <jrnieer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder
da3e217447 vcs-svn: Eliminate node_ctx.mark global
The mark variable is only used in handle_node().  Its life is
very short and simple: first, a new mark number is allocated if
this node has text attached, then that mark is recorded in the
in-core tree being built up, and lastly the mark is communicated
to fast-import in the stream along with the associated text.

A new reader may worry about interaction with other code, especially
since mark is not initialized to zero in handle_node() itself.
Disperse such worries by making it local.  No functional change
intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder
1d13e9f600 vcs-svn: Eliminate node_ctx.srcRev global
The srcRev variable is only used in handle_node(); its purpose
is to hold the old mode for a path, to only be used if properties
are not being changed.  Narrow its scope to make its meaningful
lifetime more obvious.

No functional change intended.  Add some tests as a sanity-check
for the simplest case (no renames).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
Jonathan Nieder
5c28a8b054 vcs-svn: Check for errors from open()
test-svn-fe segfaults when passed a bogus path.  Simplify debugging by
exiting with a meaningful error message instead.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:51:42 -08:00
David Barr
1f05d07c45 vcs-svn: Allow simple v3 dumps (no deltas yet)
Since the dumpfile version 1 days, the Subversion dump format
gained some new fields:

 - a unique identifier for the repository (version 2 format)
 - whether the text and properties for a node should be
   interpreted as deltas
 - checksums for a delta's preimage
 - SHA-1 sums as alternatives to the existing MD5 checksums for
   copy source and the payload (delta).

For now what is relevant to us is the Text-delta and Prop-delta
fields, since not noticing these causes a dump file to be
misinterpreted (see the previous commit).

[jn: with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:48:54 -08:00
Jonathan Nieder
b3e5bce1aa vcs-svn: Error out for v3 dumps
By ignoring the Text-Delta and Prop-Delta node fields, current svn-fe
happily mistakes deltas for full text and instead of cleanly erroring
out, it produces a valid but semantically bogus fast-import stream
when fed a dump file in the modern "svnadmin dump --deltas" format.

Dump file parsers are supposed to ignore header fields they don't
understand (to allow for backward-compatible extensions), but they are
also supposed to check the SVN-fs-dump-format-version header to
prevent misinterpretation of non backward-compatible extensions.
Do so.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24 14:48:52 -08:00
Ramsay Jones
5418d96ddc vcs-svn: Fix some printf format compiler warnings
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

      CC vcs-svn/fast_export.o
  vcs-svn/fast_export.c: In function `fast_export_modify':
  vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3)
  vcs-svn/fast_export.c: In function `fast_export_commit':
  vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5)
  vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c: In function `fast_export_blob':
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2)
  vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3)
      CC vcs-svn/svndump.o
  vcs-svn/svndump.c: In function `svndump_read':
  vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3)

In order to suppress the warnings we use the C99 format specifier
macros PRIo32 and PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12 10:24:55 -07:00
Jonathan Nieder
6117abae56 vcs-svn: Avoid %z in format string
In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in
merge-recursive, 2009-05-23), use a 32-bit integer instead; the
dump file parser does not support any better, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
Jonathan Nieder
68b4cfbc91 vcs-svn: Rename dirent pool to build on Windows
dirent is #define’d to mingw_dirent in compat/mingw.h, with the
result that

 obj_pool_gen(dirent, struct repo_dirent, 4096)

creates functions with names like mingw_dirent_alloc and
references to dirent_alloc go unresolved.  Rename the functions
to dent_* to avoid this problem.

Reported-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
Jonathan Nieder
6ad263ce7a treap: style fix
Missing spaces in while (0) and trpn_pointer(a, b).

Remove parentheses around return value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
David Barr
21746aa34f SVN dump parser
svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer and uses repo_tree and
fast_export to emit a git fast-import stream.

Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase
project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and
others.

[rr: allow input from files other than stdin]
[jn: with test, more error reporting]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:38 -07:00
David Barr
c0e6c23dca Infrastructure to write revisions in fast-export format
repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr
3bbaec00a8 Add stream helper library
This library provides thread-unsafe fgets()- and fread()-like
functions where the caller does not have to supply a buffer.  It
maintains a couple of static buffers and provides an API to use
them.

[rr: allow input from files other than stdin]
[jn: with tests, documentation, and error handling improvements]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr
1d73b52f5b Add string-specific memory pool
Intern strings so they can be compared by address and stored without
wasting space.

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

[rr: added API docs]
[jn: with some API simplifications, new documentation and tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
Jason Evans
951f316470 Add treap implementation
Provide macros to generate a type-specific treap implementation and
various functions to operate on it. It uses obj_pool.h to store memory
nodes in a treap.  Previously committed nodes are never removed from
the pool; after any *_commit operation, it is assumed (correctly, in
the case of svn-fast-export) that someone else must care about them.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  The bad worst-case
behavior is a small price to pay, given that treaps are much simpler
to implement.

>From http://www.canonware.com/download/trp/trp_hash/trp.h

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]
[rr: Squelched compiler warnings]
[db: Added support for immutable treap nodes]
[jn: Reintroduced treap_nsearch(); with tests]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
David Barr
4709455db3 Add memory pool library
Add a memory pool library implemented using C macros. The
obj_pool_gen() macro creates a type-specific memory pool.

The memory pool library is distinguished from the existing specialized
allocators in alloc.c by using a contiguous block for all allocations.
This means that on one hand, long-lived pointers have to be written as
offsets, since the base address changes as the pool grows, but on the
other hand, the entire pool can be easily written to the file system.
This could allow the memory pool to persist between runs of an
application.

For the svn importer, such a facility is useful because each svn
revision can copy trees and files from any previous revision.  The
relevant information for all revisions has to persist somehow to
support incremental runs.

[rr: minor cleanups]
[jn: added tests; removed file system backing for now]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00
Jonathan Nieder
3f527372d9 Introduce vcs-svn lib
Teach the build system to build a separate library for the
upcoming subversion interop support.

The resulting vcs-svn/lib.a does not contain any code, nor is
it built during a normal build.  This is just scaffolding for
later changes.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00