Several bugs were found and fixed while getting this to work:
* Remember the 'R'(eplace) case of actions and treat it like we
would an 'A'(dd) case.
* Fix a small case of follow-parent missing a parent if a
subdirectory was modified in the revision where the parent was
copied.
* dirents returned by get_dir sometimes expire if the data
structure is too big and the pool is destroyed, so we
cache get_dir (along with check_path and get_revprops)
temporarily along with its pool.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
I incorrectly used $path/? and $path/* to strip off leading
directories, but places where $path = 'branches/0.17' would
incorrectly strip changes to 'branches/0.17.1' as well.
For globs, we require that our '*' is its own path component
(surrounded by '/' or nothing). Enforce this when --prefix= is
passed to us, too.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
--no-follow-parent disables and reverts it back to the old
default behavior of not following parents (if you don't care for
full history).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
We don't need them anymore, all the rough points of
the --follow-parent implementation have been worked out.
The only improvement in the future will probably be
--follow-parent-harder, which will track subdirectories and
follow individual file history (so annotate/blame can be
complete); but that is still a ways off.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
These checks were needed before git-svn got smarter about
match_paths() and using path information returned by get_log().
We also have extra checking against fetching revisions
out-of-order these days; so we don't have to worry about that as
much. We also check for tree deletions in match_paths() and
skip those as well.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
We can have a branch that was deleted, then re-added under the
same name but copied from another path, in which case we'll have
multiple parents (we don't want to break the original ref, nor
lose copypath info).
Add a test for this, too, of course.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
This is similar to the way git proper handles refs, except we
use the keys 'branches' and 'tags' to distinguish when we want
to use wildcards.
The left-hand side of the ':' contains the remote path, and must
have one asterisk ('*') in it for the branch name. The asterisk
may be in any component of the path as long as is it on its own
directory level.
The right-hand side contains the refname and must have the
asterisk as the last path component.
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
Signed-off-by: Eric Wong <normalperson@yhbt.net>
This is an optimization that should conserve network
bandwidth on certain repositories and configurations.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
It can be confusing and redundant, since historically the
default remote ref (not remote itself) has been "git-svn", too.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Since fetch_loop_common starts from the lowest revision number
in a group of Git::SVN objects; we want to avoid refetching
get_log for current users for things we've already cut it.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
This was originally needed before we used the delta fetcher and
had a less-clean follow-parent implementation that could leave
holes in the history.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
It looks better (like [remote "origin"]) instead of whatever
refname came up first in our directory traversal. Of course
--remote= overrides this.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Having 'fetch' entries in the config file created from
--follow-parent is wasteful because it can cause *future* of
invocations to follow revisions we were never interested in
in the first place.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Using buffered IO for reading 40-41 bytes at a time isn't very
efficient. Buffering writes for a short duration is alright
since we close() right away and buffers will be flushed.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Prefill .rev_db to the maximum revision we tried to fetch;
and take advantage of that so we can avoid using get_log()
on ranges we've already seen (and have deemed uninteresting).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Defer any signals that cause termination while they are
updating; and put the update-ref call as close to the rename()
as possible. Also, make things extra-safe (but slower) for
people using --no-metadata since they can't rely on .rev_db
being rebuilt if it's clobbered (well, I'm calling update-ref
with the -m flag for reflogs, we don't yet have a way to rebuild
.rev_db from reflogs.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Passing very large strings as arguments is bad for memory usage
as it never seems to get freed in Perl. The .rev_db format is
already not optimized for projects with sparse history.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
get_log with explicit paths is the safest way to get revisions
that change a particular path we're interested in.
Unfortunately that means we still have to run get_log multiple
times for each path we're interested in, and even more if
a path gets deleted.
The first argument of get_log() is an array reference, but we
shouldn't use more than one element in that array ref because
the non-existence of _one_ of those paths for a particular range
would cause an error for all paths in that range, so yes, we
need multiple get_log calls to be on the safe side...
Signed-off-by: Eric Wong <normalperson@yhbt.net>
--svn-remote allows the default remote name to be overridden (useful
for tracking multiple SVN repositories).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
We no longer delete the top-level directory even if it got
deleted from the upstream repository. In gs_do_update; we
double-check that the path we're tracking exists at both
endpoints before proceeding. We have also added additional
protection against fetching revisions out-of-order.
To simplify our internal interfaces, I've disabled passing the
'recursive' flag to the gs_do_{switch,update} wrapper functions
since we always want it in git-svn. We also pass the
entire Git::SVN object rather than just the path because it
helped me debug.
When printing progress, the refname is printed out to make
it less confusing when multi-fetch is running.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Since refs/remotes/* are not automatically cloned, we expect the
user to be capable of copying those references themselves
anyways.
Also removed the documentation for --ignore-nodate while we're
at it; it has also been made automatic.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
We were still skipping path information from get_log if we are
tracking /r9270/drunk/subversion/bindings/..., but got something
like this in the log:
A /r9270/drunk (from /r9270/trunk:14)
Signed-off-by: Eric Wong <normalperson@yhbt.net>
I can't seem to figure out what I or the SVN libraries are doing
wrong, but it appears to be related to reparent and probably
some global structure that gets reset if multiple SVN
connections are being used.
So now, in order to use do_switch; we'll open a new connection
to the repository with the complete URL; but we can't seem to
ever use an existing Ra object after another one has been
created...
Signed-off-by: Eric Wong <normalperson@yhbt.net>
We don't need our own error handler for other operations. Also
add a message about the successfully do_switch or do_update in
follow-parent for debugging do_switch failures with svn:// and
svn+ssh:// connections.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Historically, git-svn did not always use Digest::MD5 because
it did not use the SVN::Delta::Editor interfaces. Nowadays
it does, and the requires make strace more noisy.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Using path names as refnames breaks horribly if a user is
tracking one large, toplevel directory, and a lower-level
directory is followed from another project is a parent
of another ref, as it will cause refnames such as:
'refs/remotes/trunk/path/to/stuff', which will conflict
with a refname of 'refs/remotes/trunk'.
Now we just append @$revno to the end of it the current
refname. And if we have followed back to a grandparent, then
we'll strip any existing '@$parent_revno' strings before
appending our own '@$revno' string to it.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
The do_update or do_switch functions in SVN only allow for a
single path component; so 'path/to/deep/dir' would be
interpreted as 'path'.
SVN 1.4.x has a reparent function that can let us change the
session to use a higher-level root of the repository, so we can
use that for do_switch (which still doesn't seem to work in SVN
1.4.3 (a fix was attempted, but they missed the rest of the
typemap changes needed in trunk...)).
On the do_update side, we can use set_path on higher level
directories and set them to a newer revision so they don't get
updated. We can't do this with do_switch, either, because the
relative path we're tracking can change (directory moving into
a child of itself).
Because of these changes, we need to double check that our Fetch
editor is correctly performing stripping on any prefixed paths
from update, otherwise we'll just die() because that would be
a bug.
Added a test case which helped me notice and fix problems with
do_switch, too.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Also, this should allow for the tracking of new, but empty
directories where we would want to see the log message.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Since single fetching is a special case of multi-fetch,
share code with it and the fetch loop into Git::SVN::Ra
since it uses a single Ra connection and multiple
Git::SVN objects.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Before, we needed a separate svn_ra instance to run
our check_path calls once the editor was active; but
we can avoid that by running all the check_path calls
before our editor is active.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
I broke this part with the URL minimization; since
git-svn will now try to connect to the root of
the repository and will end up writing files
there if it can...
Signed-off-by: Eric Wong <normalperson@yhbt.net>
svn_log_changed_path_t structs were being used out of scope
outside of svn_ra_get_log (because I wanted to eventually be
able to use git-svn with only a single connection to the
repository). So now we dup them into a hash.
This was fixed while making --follow-parent fetches more
efficient. I've moved parsing of the command-line --revision
argument outside of the Git::SVN module so Git::SVN::fetch() can
be used in more places (such as find_parent_branch).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
git-svn has never been able to handle deleted branches very well
because svn_ra_get_log() is all-or-nothing, meaning that if the
max revision passed to it does not contain the path we're
tracking, we miss all the revisions in the repository.
Branches fetched using --follow-parent still do this
sub-optimally (will be fixed soon). --follow-parent will soon
become the default, so we will assume that when using get_log();
We will also avoid tracking revprops for revisions with no
path-related changes since otherwise we just end up pulling
logs to paths we don't care about.
Also added a test for this to t9104-git-svn-follow-parent.sh and
correctly commit the log message in the preceeding test (which
conflicted with a filename).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
They simply aren't interesting to track, and this will allow
us to avoid get_log().
Since r0 is covered by this, we need to update the tests to not
rely on r0 (which is always empty).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Introducing Git::IndexInfo. This module will probably be useful
outside of git-svn, so I'm not putting it in the Git::SVN
namespace.
This will allow me to more easily avoid the use of get_log() in
the future and simply run do_update in incrementing ranges.
get_log() should be avoided because there are cases where
moved/deleted directories do not track correctly (until
--follow-parent is run on a new branch).
Signed-off-by: Eric Wong <normalperson@yhbt.net>
This means that tracking the path of:
/another-larger/trunk/thunk/bump/thud inside a repository
would follow:
/larger-parent/trunk/thunk/bump/thud
even if the svn log output looks like this:
--------------------------------------------
Changed paths:
A /another-larger (from /larger-parent:5)
--------------------------------------------
Note: the usage of get_log() in git-svn still makes a
an assumption that shouldn't be made with regard to
revisions existing for a particular path.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
This allows connections to be used more efficiently and not require
users to run 'git-svn migrate --minimize' for new repositories.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Tests always ran 'git init' before we ran so that repo-config
would always have something to read. However that does not work
in real-world situations where the user expects 'git svn init'
to work without running 'git init' first.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
It's not really useful anymore now that we have a better
--follow-parent for the valid cases. Any other use
of it is not valid.
Signed-off-by: Eric Wong <normalperson@yhbt.net>