Given a list of <pack>.idx files, this command validates the
index file and the corresponding .pack file for consistency.
This patch also uses the same validation mechanism in fsck-cache
when the --full flag is used.
During normal operation, sha1_file.c verifies that a given .idx
file matches the .pack file by comparing the SHA1 checksum
stored in .idx file and .pack file as a minimum sanity check.
We may further want to check the pack signature and version when
we map the pack, but that would be a separate patch.
Earlier, errors to map a pack file was not flagged fatal but led
to a random fatal error later. This version explicitly die()s
when such an error is detected.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we prefer 0 as maxsize for diff_delta() to say "unlimited", let's be
consistent about it.
This patch also fixes type mismatch in a call to get_delta_hdr_size()
from packed_delta_info().
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a wrap-up patch including all the cleanups I've done to the
delta code and its usage. The most important change is the
factorization of the delta header handling code.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes it match the new delta encoding, and admittedly makes the
code easier to follow.
This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
The commands git-fsck-cache and probably git-*-pull needs to have a way
to enumerate objects contained in packed GIT archives and alternate
object pools. This commit exposes the data structure used to keep track
of them from sha1_file.c, and adds a couple of accessor interface
functions for use by the enhanced git-fsck-cache command.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was causing random segfaults, because use_packed_git() got
confused by random garbage there.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This also adds a header with a signature, version info, and the number
of objects to the pack file. It also encodes the file length and type
more efficiently.
The initial one was not doing enough to figure things out
without uncompressing too much. It also fixes a potential
segfault resulting from missing use_packed_git() call.
We would need to introduce unuse_packed_git() call and do proper
use counting to figure out when it is safe to unmap, but
currently we do not unmap packed file yet.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now, there's still a misfeature there, which is that when you
create a new object, it doesn't check whether that object
already exists in the pack-file, so you'll end up with a few
recent objects that you really don't need (notably tree
objects), and this patch fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES can
have the "pack" subdirectory that houses "packed GIT" files
produced by git-pack-objects (e.g. .git/objects/pack/foo.pack
and .git/objects/pack/foo.idx; always store them as pairs). The
following functions in sha1_file.c can then read object contents
from such packed file:
- sha1_object_info()
- has_sha1_file()
- read_sha1_file()
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a patch that fixes several gcc4 warnings about different signedness,
all between char and unsigned char. I tried to keep the patch minimal
so resertod to casts in three places.
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make 'sha1' parameters const where possible
Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds code to read a hash out of a specified file under
{GIT_DIR}/refs/, and to write such files atomically and optionally with an
compare and lock.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a remote repository is deltified, we need to get the
objects that a deltified object we want to obtain is based upon.
The initial parts of each retrieved SHA1 file is inflated and
inspected to see if it is deltified, and its base object is
asked from the remote side when it is. Since this partial
inflation and inspection has a small performance hit, it can
optionally be skipped by giving -d flag to git-*-pull commands.
This flag should be used only when the remote repository is
known to have no deltified objects.
Rsync transport does not have this problem since it fetches
everything the remote side has.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make a separate helper for parsing the header of an object file
(really carefully) and for unpacking the rest. This means that
anybody who uses the "unpack_sha1_header()" interface can easily
look at the header and decide to unpack the rest too, without
doing any extra work.
It's for people who aren't necessarily interested in the whole
unpacked file, but do want to know the header information (size,
type, etc..)
For example, the delta code can use this to figure out whether
an object is already a delta object, and what it is a delta
against, without actually bothering to unpack all of the actual
data in the delta.
Add <limits.h> to the include files handled by "cache.h", and remove
extraneous #include directives from various .c files. The rule is that
"cache.h" gets all the basic stuff, so that we'll have as few system
dependencies as possible.
This makes the core code aware of delta objects and undeltafy them as
needed. The convention is to use read_sha1_file() to have
undeltafication done automatically (most users do that already so this
is transparent).
If the delta object itself has to be accessed then it must be done
through map_sha1_file() and unpack_sha1_file().
In that context mktag.c has been switched to read_sha1_file() as there
is no reason to do the full map+unpack manually.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various things that sparse complains about:
- use NULL instead of 0
- make sure we declare everything properly, or mark it static
- use proper function declarations ("fn(void)" instead of "fn()")
Sparse is always right.
- Raw hashes should be unsigned char.
- String functions want signed char.
- Hash and compress functions want unsigned char.
Signed-off By: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During the mailing list discussion on renaming GIT_ environment
variables, people felt that having one environment that lets the
user (or Porcelain) specify both SHA1_FILE_DIRECTORY (now
GIT_OBJECT_DIRECTORY) and GIT_INDEX_FILE for the default layout
would be handy. This change introduces GIT_DIR environment
variable, from which the defaults for GIT_INDEX_FILE and
GIT_OBJECT_DIRECTORY are derived. When GIT_DIR is not defined,
it defaults to ".git". GIT_INDEX_FILE defaults to
"$GIT_DIR/index" and GIT_OBJECT_DIRECTORY defaults to
"$GIT_DIR/objects".
Special thanks for ideas and discussions go to Petr Baudis and
Daniel Barkalow. Bugs are mine ;-)
Signed-off-by: Junio C Hamano <junkio@cox.net>
H. Peter Anvin mentioned that using SHA1_whatever as an
environment variable name is not nice and we should instead use
names starting with "GIT_" prefix to avoid conflicts. Here is
what this patch does:
* Renames the following environment variables:
New name Old Name
GIT_AUTHOR_DATE AUTHOR_DATE
GIT_AUTHOR_EMAIL AUTHOR_EMAIL
GIT_AUTHOR_NAME AUTHOR_NAME
GIT_COMMITTER_EMAIL COMMIT_AUTHOR_EMAIL
GIT_COMMITTER_NAME COMMIT_AUTHOR_NAME
GIT_ALTERNATE_OBJECT_DIRECTORIES SHA1_FILE_DIRECTORIES
GIT_OBJECT_DIRECTORY SHA1_FILE_DIRECTORY
* Introduces a compatibility macro, gitenv(), which does an
getenv() and if it fails calls gitenv_bc(), which in turn
picks up the value from old name while giving a warning about
using an old name.
* Changes all users of the environment variable to fetch
environment variable with the new name using gitenv().
* Updates the documentation and scripts shipped with Linus GIT
distribution.
The transition plan is as follows:
* We will keep the backward compatibility list used by gitenv()
for now, so the current scripts and user environments
continue to work as before. The users will get warnings when
they have old name but not new name in their environment to
the stderr.
* The Porcelain layers should start using new names. However,
just in case it ends up calling old Plumbing layer
implementation, they should also export old names, taking
values from the corresponding new names, during the
transition period.
* After a transition period, we would drop the compatibility
support and drop gitenv(). Revert the callers to directly
call getenv() but keep using the new names.
The last part is probably optional and the transition
duration needs to be set to a reasonable value.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes stylistic problems and one unused variable spotted by
Petr Baudis. The buf variable unused in prepare_alt_odb() is
gone and the "creepy" function is more heavily documented.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A deflate loop in sha1_file.c would have /* nothing */ as its
body, but the semicolon was missing, so the next command was run.
Fortunately the loop went through exactly once so it didn't trigger
an actual bug so far.
Signed-Off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Petr Baudis <pasky@ucw.cz>
<JC> Editorial Note. We may want to include standard headers in one
of those headers everybody includes, e.g. cache.h, to reduce clutters,
but this commit is as Thomas posted to the GIT list.
Date: Sat, 7 May 2005 10:41:41 +0200
Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This does not matter for commands that write just a handful SHA1 files,
but is noticeable in git-convert-cache which essentially traverses the
entire object database.
Signed-off-by: Junio C Hamano <junkio@cox.net>
SHA1_FILE_DIRECTORIES environment variable is a colon separated paths
used when looking for SHA1 files not found in the usual place for
reading. Creating a new SHA1 file does not use this alternate object
database location mechanism. This is useful to archive older, rarely
used objects into separate directories.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Make it much safer: we write to a temporary file, and then link that
temporary file to the final destination. This avoids all the nasty
races if several people write the same object at the same time.
It should also result in nicer on-disk layout, since it means that
objects all get created in the same subdirectory. That makes a lot
of block allocation algorithms happier, since the objects will now
be allocated from the same zone.
A new command, git-write-blob, is introduced. This registers
the contents of any file on the filesystem as a blob in the
object database and reports its SHA1 to the standard output.
To implement it, the patch promotes index_fd() from a static
function in update-cache.c to extern and moves it to a library
source, sha1_file.c.
This command is used to update git-merge-one-file-script so that
it does not smudge the work tree.
Signed-off-by: Junio C Hamano <junkio@cox.net>
If somebody wants it later, we can re-do it, but for now we consider
it an experiment that wasn't worth it. Git will still honor symbolic
names, it just won't look up parents for you.
Of course, you can always do it by hand if you want to.
It uses the jit syntax, at least for now. 0-xxxx is the first parent of xxxx,
while 1-xxxx is the second, and so on. You can use just "-xxxx" for the first
parent, but a lot of commands will think that the initial '-' implies a
command line flag.
This allows the programs to use various simplified versions of
the SHA1 names, eg just say "HEAD" for the SHA1 pointed to by
the .git/HEAD file etc.
For example, this commit has been done with
git-commit-tree $(git-write-tree) -p HEAD
instead of the traditional "$(cat .git/HEAD)" syntax.
This patch renames read_tree_with_tree_or_commit_sha1() to
read_object_with_reference() and extends it to automatically
dereference not just "commit" objects but "tag" objects. With
this patch, you can say e.g.:
ls-tree $tag
read-tree -m $(merge-base $tag $HEAD) $tag $HEAD
diff-cache $tag
diff-tree $tag $HEAD
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce xmalloc and xrealloc to die gracefully with a descriptive
message when out of memory, rather than taking a SIGSEGV.
Signed-off-by: Christopher Li<chrislgit@chrisli.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is how to trigger it:
echo blob 100 > .git/objects/00/ae4e8d3208e09f2cf7a38202a126f728cadb49
Then run fsck-cache. It will try to unpack after the header to calculate
the hash, inflate returns total_out == 0 and memcpy() dies.
The patch below seems to work with ZLIB 1.1 and 1.2.
Signed-off-by: Andreas Gal <gal@uci.edu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds two functions: one to check if an object is present in the local
database, and one to add an object to the local database by reading it
from a file descriptor and checking its hash.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We really don't care about atime, and it sucks to dirty the
inode cache just for it.
This is more than a one-liner only because we need to be able to
clear the O_NOATIME flag in case some of the objects are owned
by others (in which case open will return EPERM), and because not
everybody has the O_NOATIME flag.