A corrupt Subversion-format delta can request reads past the end of
the preimage. Set sliding_view::max_off so such corruption is caught
when it appears rather than blocking in an impossible-to-fulfill
read() when input is coming from a socket or pipe.
Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Handle input in Subversion's dumpfile format, version 3. This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.
To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.
The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.
The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.
NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
* db/delta-applier:
vcs-svn: let deltas use data from preimage
vcs-svn: let deltas use data from postimage
vcs-svn: verify that deltas consume all inline data
vcs-svn: implement copyfrom_data delta instruction
vcs-svn: read instructions from deltas
vcs-svn: read inline data from deltas
vcs-svn: read the preimage when applying deltas
vcs-svn: parse svndiff0 window header
vcs-svn: skeleton of an svn delta parser
vcs-svn: make buffer_read_binary API more convenient
vcs-svn: learn to maintain a sliding view of a file
Makefile: list one vcs-svn/xdiff object or header per line
Conflicts:
Makefile
vcs-svn/LICENSE
* db/svn-fe-code-purge:
vcs-svn: drop obj_pool
vcs-svn: drop treap
vcs-svn: drop string_pool
vcs-svn: pass paths through to fast-import
Conflicts:
vcs-svn/fast_export.c
vcs-svn/fast_export.h
vcs-svn/repo_tree.c
vcs-svn/repo_tree.h
vcs-svn/string_pool.c
vcs-svn/svndump.c
vcs-svn/trp.txt
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI. Think of
it as growing pains. This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.
* db/vcs-svn-incremental:
vcs-svn: avoid using ls command twice
vcs-svn: use mark from previous import for parent commit
vcs-svn: handle filenames with dq correctly
vcs-svn: quote paths correctly for ls command
vcs-svn: eliminate repo_tree structure
vcs-svn: add a comment before each commit
vcs-svn: save marks for imported commits
vcs-svn: use higher mark numbers for blobs
vcs-svn: set up channel to read fast-import cat-blob response
Conflicts:
t/t9010-svn-fe.sh
vcs-svn/fast_export.c
vcs-svn/fast_export.h
vcs-svn/repo_tree.c
vcs-svn/svndump.c
On systems where the local time and file modification time may be out of
sync (e.g. test directory on NFS) t3306 and t5305 can fail because prune
compares times such as "now" (client time) with file modification times
(server times for remote file systems). I.e., these are spurious test
failures.
Avoid this by setting the relevant modification times to the local time.
Noticed on a system with as little as 2s time skew.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/maint-remote-mirror-safer:
remote: deprecate --mirror
remote: separate the concept of push and fetch mirrors
remote: disallow some nonsensical option combinations
* jl/submodule-fetch-on-demand:
fetch/pull: Describe --recurse-submodule restrictions in the BUGS section
submodule update: Don't fetch when the submodule commit is already present
fetch/pull: Don't recurse into a submodule when commits are already present
Submodules: Add 'on-demand' value for the 'fetchRecurseSubmodule' option
config: teach the fetch.recurseSubmodules option the 'on-demand' value
fetch/pull: Add the 'on-demand' value to the --recurse-submodules option
fetch/pull: recurse into submodules when necessary
Conflicts:
builtin/fetch.c
submodule.c
For a pull into an unborn branch, we do not use "git merge"
at all. Instead, we call read-tree directly. However, we
used the --reset parameter instead of "-m", which turns off
the safety features.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/format-patch-multiline-header:
format-patch: rfc2047-encode newlines in headers
format-patch: wrap long header lines
strbuf: add fixed-length version of add_wrapped_text
The t2019-checkout-ambiguous-ref.sh tests added in v1.7.4.3~12^2
examines the output for a translatable string, and must be marked
with C_LOCALE_OUTPUT; otherwise, GETTEXT_POISON=YesPlease tests
will break.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During a merge module_list returns conflicting submodules several times
(stage 1,2,3) which caused the submodules to be used multiple times in
git submodule init, sync, update and status command.
There are 5 callers of module_list; they all read (mode, sha1, stage,
path) tuple, and most of them care only about path. As a first level
approximation, it should be Ok (in the sense that it does not make things
worse than it currently is) to filter the duplicate paths from module_list
output, but some callers should change their behaviour when the merge in
the superproject still has conflicts.
Notice the higher-stage entries, and emit only one record from
module_list, but while doing so, mark the entry with "U" (not [0-3]) in
the $stage field and null out the SHA-1 part, as the object name for the
lowest stage does not give any useful information to the caller, and this
way any caller that uses the object name would hopefully barf. Then
update the codepaths for each subcommands this way:
- "update" should not touch the submodule repository, because we do not
know what commit should be checked out yet.
- "status" reports the conflicting submodules as 'U000...000' and does
not recurse into them (we might later want to make it recurse).
- The command called by "foreach" may want to do whatever it wants to do
by noticing the merged status in the superproject itself, so feed the
path to it from module_list as before, but only once per submodule.
- "init" and "sync" are unlikely things to do while the superproject is
still not merged, but as long as a submodule is there in $path, there
is no point skipping it. It might however want to take the merged
status of .gitmodules into account, but that is outside of the scope of
this topic.
Acked-by: Jens Lehmann <Jens.Lehmann@web.de>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas@morey-chaisemartin.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
contrib/thunderbird-patch-inline: do not require bash to run the script
t8001: check the exit status of the command being tested
strbuf.h: remove a tad stale docs-in-comment and reference api-doc instead
Typos: t/README
Documentation/config.txt: make truth value of numbers more explicit
git-pack-objects.txt: fix grammatical errors
parse-remote: replace unnecessary sed invocation
git-remote currently has one option, "--mirror", which sets
up mirror configuration which can be used for either
fetching or pushing. It looks like this:
[remote "mirror"]
url = wherever
fetch = +refs/*:refs/*
mirror = true
However, a remote like this can be dangerous and confusing.
Specifically:
1. If you issue the wrong command, it can be devastating.
You are not likely to "push" when you meant to "fetch",
but "git remote update" will try to fetch it, even if
you intended the remote only for pushing. In either
case, the results can be quite destructive. An
unintended push will overwrite or delete remote refs,
and an unintended fetch can overwrite local branches.
2. The tracking setup code can produce confusing results.
The fetch refspec above means that "git checkout -b new
master" will consider refs/heads/master to come from
the remote "mirror", even if you only ever intend to
push to the mirror. It will set up the "new" branch to
track mirror's refs/heads/master.
3. The push code tries to opportunistically update
tracking branches. If you "git push mirror foo:bar",
it will see that we are updating mirror's
refs/heads/bar, which corresponds to our local
refs/heads/bar, and will update our local branch.
To solve this, we split the concept into "push mirrors" and
"fetch mirrors". Push mirrors set only remote.*.mirror,
solving (2) and (3), and making an accidental fetch write
only into FETCH_HEAD. Fetch mirrors set only the fetch
refspec, meaning an accidental push will not force-overwrite
or delete refs on the remote end.
The new syntax is "--mirror=<fetch|push>". For
compatibility, we keep "--mirror" as-is, setting up both
types simultaneously.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add two configration variables grep.extendedRegexp and grep.lineNumbers to
allow the user to skip typing -E and -n on the command line, respectively.
Scripts that are meant to be used by random users and/or in random
repositories now have use -G and/or --no-line-number options as
appropriately to override the settings in the repository or user's
~/.gitconfig settings. Just because the script didn't say "git grep -n" no
longer guarantees that the output from the command will not have line
numbers.
Signed-off-by: Joe Ratterman <jratt0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid running the command being tested as an upstream of a pipe;
doing so will lose its exit status.
While at it, modernise the style of the script.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This (morally) reverts commit d280f68313,
which added some tests that are a pain to maintain and are not likely
to find bugs in git.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jeff King <peff@peff.net>
* 'svn-fe' of git://repo.or.cz/git/jrn:
tests: kill backgrounded processes more robustly
vcs-svn: a void function shouldn't try to return something
tests: make sure input to sed is newline terminated
vcs-svn: add missing cast to printf argument
t0081 creates several background processes that write to a fifo and
then go to sleep for a while (so the reader of the fifo does not see
EOF).
Each background process is made in a curly-braced block in the shell,
and after we are done reading from the fifo, we use "kill $!" to kill
it off.
For a simple, single-command process, this works reliably and kills
the child sleep process. But for more complex commands like
"make_some_output && sleep", the results are less predictable. When
executing under bash, we end up with a subshell that gets killed by
the $! but leaves the sleep process still alive.
This is bad not only for process hygeine (we are leaving random sleep
processes to expire after a while), but also interacts badly with the
"prove" command. When prove executes a test, it does not realize the
test is done when it sees SIGCHLD, but rather waits until the test's
stdout pipe is closed. The orphaned sleep process may keep that pipe
open via test-lib's file descriptor 5, causing prove to hang for 100
seconds.
The solution is to explicitly use a subshell and to exec the final
sleep process, so that when we "kill $!" we get the process id of the
sleep process.
[jn: original patch by Jeff had some additional bits:
1. Wrap the "kill" in a test_when_finished, since we want
to clean up the process whether the test succeeds or not.
2. The "kill" is part of our && chain for test success. It
probably won't fail, but it can if the process has
expired before we manage to kill it. So let's mark it
as OK to fail.
I'm postponing that for now.]
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Otherwise the created test repositories will be affected by users ~/.gitconfig.
For example, setting core.logAllrefupdates in users config will make all
calls to "git config --unset core.logAllrefupdates" fail which will break
the first test which uses the statement and expects it to succeed.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
POSIX only requires sed to work on text files and because it does
not end with a newline, this commit's content is not a text file.
Add a newline to fix it. Without this change, OS X sed helpfully
adds a newline to actual.message, causing t9010.13 to fail.
Reported-by: Torsten Bögershausen <tboegi@web.de>
Tested-by: Brian Gernhardt <benji@silverinsanity.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
In commit 95a1d12e9b ("tests: scrub environment of GIT_* variables") all
environment variables starting with "GIT_" were unset for the tests using
a perl script rather than unsetting them one by one. Only three exceptions
were made to make them work as before: "GIT_TRACE*", "GIT_DEBUG*" and
"GIT_USE_LOOKUP".
Unfortunately some environment variables used by the test framework itself
were not added to the exceptions and thus stopped working when given
before the make command instead of after it. Those are:
- GIT_NOTES_TIMING_TESTS
- GIT_PATCHID_TIMING_TESTS
- GIT_PROVE_OPTS
- GIT_REMOTE_SVN_TEST_BIG_FILES
- GIT_SKIP_TESTS
- GIT_TEST*
- GIT_VALGRIND_OPTIONS
I noticed that when skipping a test the way I was used to suddenly failed:
GIT_SKIP_TESTS='t1234' GIT_TEST_OPTS='--root=/dev/shm' make -j10 test
This should work according to t/README, but didn't anymore, so let's fix
that by adding them to the exception list. And to avoid having a long
regexp put the exceptions in a separate variable using nicer formatting.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The copyfrom_source instruction appends data from the preimage buffer
to the end of output. Its arguments are a length and an offset
relative to the beginning of the source view.
With this change, the delta applier is able to reproduce all 5,636,613
blobs in the early history of the ASF repository. Tested with
mkfifo backflow
svn-fe <svn-asf-public-r0:940166 3<backflow |
git fast-import --cat-blob-fd=3 3>backflow
with svn-asf-public-r0:940166 produced by whatever version of
Subversion the dumps in /dump/ on svn.apache.org use (presumably
1.6.something).
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output. (The offset
argument is relative to the beginning of output produced in the
current window.)
The region copied is allowed to run past the end of the existing
output. To support that case, copy one character at a time rather
than calling memcpy or memmove. This allows copyfrom_target to be
used once to repeat a string many times. For example:
COPYFROM_DATA 2
COPYFROM_OUTPUT 10, 0
DATA "ab"
would produce the output "ababababababababababab".
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
By constraining the format of deltas, we can more easily detect
corruption and other breakage.
Requiring deltas not to provide unconsumed data also opens the
possibility of ignoring the declared amount of novel data and simply
streaming the data as needed to fulfill copyfrom_data requests.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
The copyfrom_data instruction copies a few bytes verbatim from the
novel text section of a window to the postimage.
[jn: with memory leak fix from David]
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Buffer the instruction section upon encountering it for later
interpretation.
An alternative design would involve parsing the instructions
at this point and buffering them in some processed form. Using
the unprocessed form is simpler.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>