When the repository on the remote side is corrupted, rev-list
spawned from upload-pack would die with error, but pack-objects
that reads from the rev-list happily created a packfile that can
be unpacked by the downloader. When this happens, the resulting
packfile is not corrupted and unpacks cleanly, but the list of
the objects contained in it is not what the protocol exchange
computed.
This update makes upload-pack to monitor its subprocesses, and
when either of them dies with error, sends an incomplete pack
data to the downloader to cause it to fail.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Mark Wooding noticed there was a type mismatch warning in git.c; this
patch does things slightly differently (mostly tightening const) and
was what I was holding onto, waiting for the setup-revisions change
to be merged into the master branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/rev-list:
rev-list --objects: use full pathname to help hashing.
rev-list --objects-edge: remove duplicated edge commit output.
rev-list --objects-edge
* jc/pack-thin:
pack-objects: hash basename and direname a bit differently.
pack-objects: allow "thin" packs to exceed depth limits
pack-objects: use full pathname to help hashing with "thin" pack.
pack-objects: thin pack micro-optimization.
Use thin pack transfer in "git fetch".
Add git-push --thin.
send-pack --thin: use "thin pack" delta transfer.
Thin pack - create packfile with missing delta base.
Conflicts:
pack-objects.c (taking "next")
send-pack.c (taking "next")
The git suite may not be in PATH (and thus programs such as
git-send-pack could not exec git-rev-list). Thus there is a need for
logic that will locate these programs. Modifying PATH is not
desirable as it result in behavior differing from the user's
intentions, as we may end up prepending "/usr/bin" to PATH.
- git C programs will use exec*_git_cmd() APIs to exec sub-commands.
- exec*_git_cmd() will execute a git program by searching for it in
the following directories:
1. --exec-path (as used by "git")
2. The GIT_EXEC_PATH environment variable.
3. $(gitexecdir) as set in Makefile (default value $(bindir)).
- git wrapper will modify PATH as before to enable shell scripts to
invoke "git-foo" commands.
Ideally, shell scripts should use the git wrapper to become independent
of PATH, and then modifying PATH will not be necessary.
[jc: with minor updates after a brief review.]
Signed-off-by: Michal Ostrowski <mostrows@watson.ibm.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch basically just removes the redundant code from
{receive,upload}-pack.c in favour of the library code in path.c.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
One caller of deref_tag() was not careful enough to make sure
what deref_tag() returned was not NULL (i.e. we found a tag
object that points at an object we do not have). Fix it, and
warn about refs that point at such an incomplete tag where
needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This implements three things (trying very hard to be backwards
compatible):
It sends the "multi_ack" capability via the mechanism proposed by
Sergey Vlasov.
When the client sends "multi_ack" with at least one "want", multi_ack
is enabled.
When multi_ack is enabled, "continue" is appended to each "ACK" until
either the server can not store more refs, or "done" is received.
In contrast to the original protocol, as long as "continue" is sent,
flushes are answered by a "NAK" (not just until an "ACK" was sent),
and if "continue" was sent at least once, the last message is an
"ACK" without "continue".
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch is based on Junio's proposal. It marks parents of common revs
so that they do not clutter up the has_sha1 array.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
upload-pack would set create_full_pack=1 if nr_has==0, but would ask later
if nr_needs<MAX_NEEDS. If that proves true, it would ignore create_full_pack,
and arguments would be written into unreserved memory.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes sure what the other end asks for are among what we
offered to give them. Otherwise we would end up running
git-rev-list with 20-byte nonsense, only to find it either die
(because the object was not found) or waste time (because we
ended up serving that phony 'client').
Also avoid wasting needs_sha1 pool to record duplicates, and
detect cloning requests better.
[this used to be on top of Johannes fetch-pack enhancements,
which we are rewinding it for further testing for now, so
the commit is rebased.]
Signed-off-by: Junio C Hamano <junkio@cox.net>
The current fetch/upload protocol works like this:
- client sends revs it wants to have via "want" messages
- client sends a flush message (message with len 0)
- client sends revs it has via "have" messages
- after one window (32 revs), a flush is sent
- after each subsequent window, a flush is sent, and an ACK/NAK is received.
(NAK means that server does not have any of the transmitted revs;
ACK sends also the sha1 of the rev server has)
- when the first ACK is received, client sends "done", and does not expect
any further messages
One special case, though:
- if no ACK is received (only NAK's), and client runs out of revs to send,
"done" is sent, and server sends just one more "NAK"
A smarter scheme, which actually has a chance to detect more than one
common rev, would be to send more than just one ACK. This patch implements
the server side of the following extension to the protocol:
- client sends at least one "want" message with "multi_ack" appended, like
"want 1234567890123456789012345678901234567890 multi_ack"
- if the server understands that extension, it will send ACK messages for all
revs it has, not just the first one
- server appends "continue" to the ACK messages like
"ACK 1234567890123456789012345678901234567890 continue"
until it has MAX_HAS-1 revs. In this manner, client knows when to
stop sending revs by checking for the substring "continue" (and
further knows that server understands multi_ack)
In this manner, the protocol stays backwards compatible, since both client
must send "want ... multi_ack" and server must answer with "ACK ...
continue" to enable the extension.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch is based on Junio's proposal. It marks parents of common revs
so that they do not clutter up the has_sha1 array.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Later round would further improve fetch-pack not to send useless "have",
but in the meantime, increase it to help upload-pack to find more common
commits, as discussed on the list.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It turns out that not only did git-daemon do DWIM, but git-upload-pack
does as well. This is bad; security checks have to be performed *after*
canonicalization, not before.
Additionally, the current git-daemon can be trivially DoSed by spewing
SYNs at the target port.
This patch adds a --strict option to git-upload-pack to disable all
DWIM, a --timeout option to git-daemon and git-upload-pack, and an
--init-timeout option to git-daemon (which is typically set to a much
lower value, since the initial request should come immediately from the
client.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates git-ls-remote to show SHA1 names of objects that are
referred by tags, in the "ref^{}" notation.
This would make git-findtags (without -t flag) almost trivial.
git-peek-remote . |
sed -ne "s:^$target "'refs/tags/\(.*\)^{}$:\1:p'
Also Pasky could do:
git-ls-remote --tags $remote |
sed -ne 's:\( refs/tags/.*\)^{}$:\1:p'
to find out what object each of the remote tags refers to, and
if he has one locally, run "git-fetch $remote tag $tagname" to
automatically catch up with the upstream tags.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Cloning from a repository with more than 256 refs (heads and tags
included) will choke, because upload-pack has a built-in limit of
feeding not more than MAX_NEEDS (currently 256) heads to underlying
git-rev-list. This is a problem when cloning a repository with many
tags, like http://www.linux-mips.org/pub/scm/linux.git, which has 290+
tags.
This commit introduces a new flag, --all, to git-rev-list, to include
all refs in the repository. Updated upload-pack detects requests that
ask more than MAX_NEEDS refs, and sends everything back instead.
We may probably want to tweak the definitions of MAX_NEEDS and
MAX_HAS, but that is a separate topic.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Solaris 8 doesn't have the newer unsetenv() and setenv()
functions, so replace them with putenv(). The one use of
unsetenv() in fsck-cache.c now sets GIT_ALTERNATE_OBJECT_
DIRECTORIES to the empty string. Every place that var
is used, NULLs are also replaced with empty strings, so
it's ok.
Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
Now that git-clone-pack exists, we actually have somebody requesting
more than just a single head in a pack. So allow the Jeff's of this
world to clone things with tens of heads.
"git_path()" returns a static pathname pointer into the git directory
using a printf-like format specifier.
"head_ref()" works like "for_each_ref()", except for just the HEAD.
It returns the result SHA1 on stdout, so you can do
remote=$(git-fetch-pack host:dir branchname)
and it will unpack the objects and "remote" will be the SHA1 name of the
branch on the other side. You can then save that off, or merge it, or
whatever.
It's meant to be used by "git fetch" for the local and ssh case.
It doesn't actually do the fetching now, but it does discover the common
commit point.