GIT's guts work with a forward slash as a path separators. We do not change
that. Rather we make sure that only "normalized" paths enter the depths
of the machinery.
We have to translate backslashes to forward slashes in the prefix and in
command line arguments. Fortunately, all of them are passed through
functions in setup.c.
A macro has_dos_drive_path() is defined that checks whether a path begins
with a drive letter+colon combination. This predicate is always false on
Unix. Another macro is_dir_sep() abstracts that a backslash is also a
directory separator on Windows.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Once we find the absolute paths for git_dir and work_tree, we can make
git_dir a relative path since we know pwd will be work_tree. This should
save the kernel some time traversing the path to work_tree all the time
if git_dir is inside work_tree.
Daniel's patch didn't apply for me as-is, so I recreated it with some
differences, and here are the numbers from ten runs each.
There is some IO for me - probably due to more-or-less random flushing of
the journal - so the variation is bigger than I'd like, but whatever:
Before:
real 0m8.135s
real 0m7.933s
real 0m8.080s
real 0m7.954s
real 0m7.949s
real 0m8.112s
real 0m7.934s
real 0m8.059s
real 0m7.979s
real 0m8.038s
After:
real 0m7.685s
real 0m7.968s
real 0m7.703s
real 0m7.850s
real 0m7.995s
real 0m7.817s
real 0m7.963s
real 0m7.955s
real 0m7.848s
real 0m7.969s
Now, going by "best of ten" (on the assumption that the longer numbers
are all due to IO), I'm saying a 7.933s -> 7.685s reduction, and it does
seem to be outside of the noise (ie the "after" case never broke 8s, while
the "before" case did so half the time).
So looks like about 3% to me.
Doing it for a slightly smaller test-case (just the "arch" subdirectory)
gets more stable numbers probably due to not filling the journal with
metadata updates, so we have:
Before:
real 0m1.633s
real 0m1.633s
real 0m1.633s
real 0m1.632s
real 0m1.632s
real 0m1.630s
real 0m1.634s
real 0m1.631s
real 0m1.632s
real 0m1.632s
After:
real 0m1.610s
real 0m1.609s
real 0m1.610s
real 0m1.608s
real 0m1.607s
real 0m1.610s
real 0m1.609s
real 0m1.611s
real 0m1.608s
real 0m1.611s
where I'ld just take the averages and say 1.632 vs 1.610, which is just
over 1% peformance improvement.
So it's not in the noise, but it's not as big as I initially thought and
measured.
(That said, it obviously depends on how deep the working directory path is
too, and whether it is behind NFS or something else that might need to
cause more work to look up).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.
It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.
[*] Yes, I updated the docs for the thing. Hell really _has_ frozen
over, and the four horsemen are probably just beyond the horizon.
EVERYBODY PANIC!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was implemented as a thin wrapper around an otherwise unused
helper function parse_pack_index_file(). The code becomes simpler
and easier to read by consolidating the two.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called,
neither the latter's helper repack_object() was.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Particularly for the "alternates" file, if one will be created, we
want a path that doesn't depend on the current directory, but we want
to retain any symlinks in the path as given and any in the user's view
of the current directory when the path was given.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries. And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/add-n-u:
Make git add -n and git -u -n output consistent
"git-add -n -u" should not add but just report
Conflicts:
builtin-add.c
builtin-mv.c
cache.h
read-cache.c
* db/clone-in-c:
Add test for cloning with "--reference" repo being a subset of source repo
Add a test for another combination of --reference
Test that --reference actually suppresses fetching referenced objects
clone: fall back to copying if hardlinking fails
builtin-clone.c: Need to closedir() in copy_or_link_directory()
builtin-clone: fix initial checkout
Build in clone
Provide API access to init_db()
Add a function to set a non-default work tree
Allow for having for_each_ref() list extra refs
Have a constant extern refspec for "--tags"
Add a library function to add an alternate to the alternates file
Add a lockfile function to append to a file
Mark the list of refs to fetch as const
Conflicts:
cache.h
t/t5700-clone-reference.sh
* js/ignore-submodule:
Ignore dirty submodule states during rebase and stash
Teach update-index about --ignore-submodules
diff options: Introduce --ignore-submodules
* bc/repack:
Documentation/git-repack.txt: document new -A behaviour
let pack-objects do the writing of unreachable objects as loose objects
add a force_object_loose() function
builtin-gc.c: deprecate --prune, it now really has no effect
git-gc: always use -A when manually repacking
repack: modify behavior of -A option to leave unreferenced objects unpacked
Conflicts:
builtin-pack-objects.c
Make git recognize a new environment variable that prevents it from
chdir'ing up into specified directories when looking for a GIT_DIR.
Useful for avoiding slow network directories.
For example, I use git in an environment where homedirs are automounted
and "ls /home/nonexistent" takes about 9 seconds. Setting
GIT_CEILING_DIRS="/home" allows "git help -a" (for bash completion) and
"git symbolic-ref" (for my shell prompt) to run in a reasonable time.
Signed-off-by: David Reiss <dreiss@facebook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
normalize_absolute_path removes several oddities form absolute paths,
giving nice clean paths like "/dir/sub1/sub2". Also add a test case
for this utility, based on a new test program (in the style of test-sha1).
Signed-off-by: David Reiss <dreiss@facebook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/add-unreadable:
Add a config option to ignore errors for git-add
Add a test for git-add --ignore-errors
Add --ignore-errors to git-add to allow it to skip files with read errors
Extend interface of add_files_to_cache to allow ignore indexing errors
Make the exit code of add_file_to_index actually useful
Like with the diff machinery, update-index should sometimes just
ignore submodules (e.g. to determine a clean state before a rebase).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sb/committer:
commit: Show committer if automatic
commit: Show author if different from committer
Preparation to call determine_author_info from prepare_to_commit
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is meant to force the creation of a loose object even if it
already exists packed. Needed for the next commit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/core-optim:
Optimize symlink/directory detection
Avoid some unnecessary lstat() calls
is_racy_timestamp(): do not check timestamp for gitlinks
diff-lib.c: rename check_work_tree_entity()
diff: a submodule not checked out is not modified
Add t7506 to test submodule related functions for git-status
t4027: test diff for submodule with empty directory
Make git-add behave more sensibly in a case-insensitive environment
When adding files to the index, add support for case-independent matches
Make unpack-tree update removed files before any updated files
Make branch merging aware of underlying case-insensitive filsystems
Add 'core.ignorecase' option
Make hash_name_lookup able to do case-independent lookups
Make "index_name_exists()" return the cache_entry it found
Move name hashing functions into a file of its own
Make unpack_trees_options bit flags actual bitfields
Change cd67e4d4 introduced a new configuration parameter that told
pull to automatically perform a rebase instead of a merge. This
change provides a configuration option to enable this feature
automatically when creating a new branch.
If the variable branch.autosetuprebase applies for a branch that's
being created, that branch will have branch.<name>.rebase set to true.
Signed-off-by: Dustin Sallings <dustin@spy.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.
Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.
The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.
This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.
[ This can - and should - probably be extended upon so that we eventually
never do a bare 'lstat()' on any path entries at *all* when checking the
index, but always check the full path carefully. Right now we do not
generally check the whole path for all our normal quick index
revalidation.
We should also make sure that we're careful about all the invalidation,
ie when we remove a link and replace it by a directory we should
invalidate the symlink cache if it matches (and vice versa for the
directory cache).
But regardless, the basic function needs to be sane to do that. The old
'has_symlink_leading_path()' was not capable enough - or indeed the code
readable enough - to really do that sanely. So I'm pushing this as not
just an optimization, but as a base for further work. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit sequence used to do
if (file_exists(p->path))
add_file_to_cache(p->path, 0);
where both "file_exists()" and "add_file_to_cache()" needed to do a
lstat() on the path to do their work.
This cuts down 'lstat()' calls for the partial commit case by two
for each path we know about (because we do this twice per path).
Just move the lstat() to the caller instead (that's all that
"file_exists()" really does), and pass the stat information down to the
add_to_cache() function.
This essentially makes 'add_to_index()' the core function that adds a path
to the index, getting the index pointer, the pathname and the stat
information as arguments. There are then shorthand helper functions that
use this core function:
- 'add_to_cache()' is just 'add_to_index()' with the default index
- 'add_file_to_cache/index()' is the same, but does the lstat() call
itself, so you can pass just the pathname if you don't already have the
stat information available.
So old users of the 'add_file_to_xyzzy()' are essentially left unchanged,
and this just exposes the more generic helper function that can take
existing stat information into account.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/case-insensitive:
Make git-add behave more sensibly in a case-insensitive environment
When adding files to the index, add support for case-independent matches
Make unpack-tree update removed files before any updated files
Make branch merging aware of underlying case-insensitive filsystems
Add 'core.ignorecase' option
Make hash_name_lookup able to do case-independent lookups
Make "index_name_exists()" return the cache_entry it found
Move name hashing functions into a file of its own
Make unpack_trees_options bit flags actual bitfields
To warn the user in case he/she might be using an unintended
committer identity.
Signed-off-by: Santi Béjar <sbejar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lh/git-file:
Teach GIT-VERSION-GEN about the .git file
Teach git-submodule.sh about the .git file
Teach resolve_gitlink_ref() about the .git file
Add platform-independent .git "symlink"
The caller first calls set_git_dir() to specify the GIT_DIR, and then
calls init_db() to initialize it. This also cleans up various parts of
the code to account for the fact that everything is done with GIT_DIR
set, so it's unnecessary to pass the specified directory around.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function may only be used before the work tree is used.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in the core so that, if the alternates file has already been
read, the addition can be parsed and put into effect for the current
process.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This takes care of copying the original contents into the replacement
file after the lock is held, so that concurrent additions can't miss
each other's changes.
[jc: munged to drop mmap in favor of copy_file.]
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xread() and xwrite() return ssize_t values as their native POSIX
counterparts read(2) and write(2).
To be consistent, read_in_full() and write_in_full() should also return
ssize_t values.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes a struct ref able to represent a symref, and makes http.c
able to recognize one, and makes transport.c look for "HEAD" as a ref
in the list, and makes it dereference symrefs for the resulting ref,
if any.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git init --shared=0xxx, where '0xxx' is an octal number, will create
a repository with file modes set to '0xxx'. Users with a safe umask
value (0077) can use this option to force file modes. For example,
'0640' is a group-readable but not group-writable regardless of
user's umask value. Values compatible with old Git versions are written
as they were before, for compatibility reasons. That is, "1" for
"group" and "2" for "everybody".
"git config core.sharedRepository 0xxx" is also handled.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new function can be used by config parsers to tell if a variable
is simply set, set to 1, or set to "true".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch allows .git to be a regular textfile containing the path of
the real git directory (prefixed with "gitdir: "), which can be useful on
platforms lacking support for real symlinks.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This expands on the previous patch, and allows "git add" to sanely handle
a filename that has changed case, keeping the case in the index constant,
and avoiding aliases.
In particular, if you have an index entry called "File", but the
checked-out tree is case-corrupted and has an entry called "file"
instead, doing a
git add .
(or naming "file" explicitly) will automatically notice that we have an
alias, and will replace the name "file" with the existing index
capitalization (ie "File").
However, if we actually have *both* a file called "File" and one called
"file", and they don't have the same lstat() information (ie we're on a
case-sensitive filesystem but have the "core.ignorecase" flag set), we
will error out if we try to add them both.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
..and start using it for directory entry traversal (ie "git status" will
not consider entries that match an existing entry case-insensitively to
be a new file)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows verify_absent() in unpack_trees() to use the hash chains
rather than looking it up using the binary search.
Perhaps more importantly, it's also going to be useful for the next phase,
where we actually start looking at the cache entry when we do
case-insensitive lookups and checking the result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's really totally separate functionality, and if we want to start
doing case-insensitive hash lookups, I'd rather do it when it's
separated out.
It also renames "remove_index_entry()" to "remove_name_hash()", because
that really describes the thing better. It doesn't actually remove the
index entry, that's done by "remove_index_entry_at()", which is something
very different, despite the similarity in names.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/unpack-trees:
unpack_trees(): fix diff-index regression.
traverse_trees_recursive(): propagate merge errors up
unpack_trees(): minor memory leak fix in unused destination index
Make 'unpack_trees()' have a separate source and destination index
Make 'unpack_trees()' take the index to work on as an argument
Add 'const' where appropriate to index handling functions
Fix tree-walking compare_entry() in the presense of --prefix
Move 'unpack_trees()' over to 'traverse_trees()' interface
Make 'traverse_trees()' traverse conflicting DF entries in parallel
Add return value to 'traverse_tree()' callback
Make 'traverse_tree()' use linked structure rather than 'const char *base'
Add 'df_name_compare()' helper function
This is in an effort to make the source index of 'unpack_trees()' as
being const, and thus making the compiler help us verify that we only
access it for reading.
The constification also extended to some of the hashing helpers that get
called indirectly.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new helper is identical to base_name_compare(), except it compares
conflicting directory/file entries as equal in order to help handling DF
conflicts (thus the name).
Note that while a directory name compares as equal to a regular file
with the new helper, they then individually compare _differently_ to a
filename that has a dot after the basename (because '\0' < '.' < '/').
So a directory called "foo/" will compare equal to a file "foo", even
though "foo.c" will compare after "foo" and before "foo/"
This will be used by routines that want to traverse the git namespace
but then handle conflicting entries together when possible.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mk/maint-parse-careful:
receive-pack: use strict mode for unpacking objects
index-pack: introduce checking mode
unpack-objects: prevent writing of inconsistent objects
unpack-object: cache for non written objects
add common fsck error printing function
builtin-fsck: move common object checking code to fsck.c
builtin-fsck: reports missing parent commits
Remove unused object-ref code
builtin-fsck: move away from object-refs to fsck_walk
add generic, type aware object chain walker
Conflicts:
Makefile
builtin-fsck.c
* db/checkout: (21 commits)
checkout: error out when index is unmerged even with -m
checkout: show progress when checkout takes long time while switching branches
Add merge-subtree back
checkout: updates to tracking report
builtin-checkout.c: Remove unused prefix arguments in switch_branches path
checkout: work from a subdirectory
checkout: tone down the "forked status" diagnostic messages
Clean up reporting differences on branch switch
builtin-checkout.c: fix possible usage segfault
checkout: notice when the switched branch is behind or forked
Build in checkout
Move code to clean up after a branch change to branch.c
Library function to check for unmerged index entries
Use diff -u instead of diff in t7201
Move create_branch into a library file
Build-in merge-recursive
Add "skip_unmerged" option to unpack_trees.
Discard "deleted" cache entries after using them to update the working tree
Send unpack-trees debugging output to stderr
Add flag to make unpack_trees() not print errors.
...
Conflicts:
Makefile
* maint:
Documentation/git-am.txt: Pass -r in the example invocation of rm -f .dotest
timezone_names[]: fixed the tz offset for New Zealand.
filter-branch documentation: non-zero exit status in command abort the filter
rev-parse: fix potential bus error with --parseopt option spec handling
Use a single implementation and API for copy_file()
Documentation/git-filter-branch: add a new msg-filter example
Correct fast-export file mode strings to match fast-import standard
The requirements are:
* it may not crash on NULL pointers
* a callback function is needed, as index-pack/unpack-objects
need to do different things
* the type information is needed to check the expected <-> real type
and print better error messages
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally by Kristian Hï¿œgsberg; I fixed the conversion of rerere, which
had a different API.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts git_config_alias to the public alias_lookup
function. Because of the nature of our config parser, we
still have to rely on setting static data. However, that
interface is wrapped so that you can just say
value = alias_lookup(key);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/apply-whitespace:
ws_fix_copy(): move the whitespace fixing function to ws.c
apply: do not barf on patch with too large an offset
core.whitespace: cr-at-eol
git-apply --whitespace=fix: fix whitespace fuzz introduced by previous run
builtin-apply.c: pass ws_rule down to match_fragment()
builtin-apply.c: move copy_wsfix() function a bit higher.
builtin-apply.c: do not feed copy_wsfix() leading '+'
builtin-apply.c: simplify calling site to apply_line()
builtin-apply.c: clean-up apply_one_fragment()
builtin-apply.c: mark common context lines in lineinfo structure.
builtin-apply.c: optimize match_beginning/end processing a bit.
builtin-apply.c: make it more line oriented
builtin-apply.c: push match-beginning/end logic down
builtin-apply.c: restructure "offset" matching
builtin-apply.c: refactor small part that matches context
We used to just memcpy() the index entry when we copied the stat() and
SHA1 hash information, which worked well enough back when the index
entry was just an exact bit-for-bit representation of the information on
disk.
However, these days we actually have various management information in
the cache entry too, and we should be careful to not overwrite it when
we copy the stat information from another index entry.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the name hash removal function (which really just sets the
bit that disables lookups of it) available to external routines, and
makes read_cache_unmerged() use it when it drops an unmerged entry from
the index.
It's renamed to remove_index_entry(), and we drop the (unused) 'istate'
argument.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We handled the case of removing and re-inserting cache entries badly,
which is something that merging commonly needs to do (removing the
different stages, and then re-inserting one of them as the merged
state).
We even had a rather ugly special case for this failure case, where
replace_index_entry() basically turned itself into a no-op if the new
and the old entries were the same, exactly because the hash routines
didn't handle it on their own.
So what this patch does is to not just have the UNHASHED bit, but a
HASHED bit too, and when you insert an entry into the name hash, that
involves:
- clear the UNHASHED bit, because now it's valid again for lookup
(which is really all that UNHASHED meant)
- if we're being lazy, we're done here (but we still want to clear the
UNHASHED bit regardless of lazy mode, since we can become unlazy
later, and so we need the UNHASHED bit to always be set correctly,
even if we never actually insert the entry into the hash list)
- if it was already hashed, we just leave it on the list
- otherwise mark it HASHED and insert it into the list
this all means that unhashing and rehashing a name all just works
automatically. Obviously, you cannot change the name of an entry (that
would be a serious bug), but nothing can validly do that anyway (you'd
have to allocate a new struct cache_entry anyway since the name length
could change), so that's not a new limitation.
The code actually gets simpler in many ways, although the lazy hashing
does mean that there are a few odd cases (ie something can be marked
unhashed even though it was never on the hash in the first place, and
isn't actually marked hashed!).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch" and "git checkout -b" now honor --track option even when
the upstream branch is local. Previously --track was silently ignored
when forking from a local branch. Also the command did not error out
when --track was explicitly asked for but the forked point specified
was not an existing branch (i.e. when there is no way to set up the
tracking configuration), but now it correctly does.
The configuration setting branch.autosetupmerge can now be set to
"always", which is equivalent to using --track from the command line.
Setting branch.autosetupmerge to "true" will retain the former behavior
of only setting up branch.*.merge for remote upstream branches.
Includes test cases for the new functionality.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
commit: discard index after setting up partial commit
filter-branch: handle filenames that need quoting
diff: Fix miscounting of --check output
hg-to-git: fix parent analysis
mailinfo: feed only one line to handle_filter() for QP input
diff.c: add "const" qualifier to "char *cmd" member of "struct ll_diff_driver"
Add "const" qualifier to "char *excludes_file".
Add "const" qualifier to "char *editor_program".
Add "const" qualifier to "char *pager_program".
config: add 'git_config_string' to refactor string config variables.
diff.c: remove useless check for value != NULL
fast-import: check return value from unpack_entry()
Validate nicknames of remote branches to prohibit confusing ones
diff.c: replace a 'strdup' with 'xstrdup'.
diff.c: fixup garding of config parser from value=NULL
Also use "git_config_string" to simplify "config.c" code
where "excludes_file" is set.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also use "git_config_string" to simplify "config.c" code
where "editor_program" is set.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also use "git_config_string" to simplify "config.c" code
where "pager_program" is set.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many places we just check if a value from the config file is not
NULL, then we duplicate it and return 0. This patch introduces the new
'git_config_string' function to do that.
This function is also used to refactor some code in 'config.c'.
Refactoring other files is left for other patches.
Also not all the code in "config.c" is refactored, because the function
takes a "const char **" as its first parameter, but in many places a
"char *" is used instead of a "const char *". (And C does not allow
using a "char **" instead of a "const char **" without a warning.)
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/in-core-index:
lazy index hashing
Create pathname-based hash-table lookup into index
read-cache.c: introduce is_racy_timestamp() helper
read-cache.c: fix a couple more CE_REMOVE conversion
Also use unpack_trees() in do_diff_cache()
Make run_diff_index() use unpack_trees(), not read_tree()
Avoid running lstat(2) on the same cache entry.
index: be careful when handling long names
Make on-disk index representation separate from in-core one
This is used to report misconfigured configuration file that does not
give any value to a non-boolean variable, e.g.
[section]
var
It is perfectly fine to say it if the section.var is a boolean (it means
true), but if a variable expects a string value it should be flagged as
a configuration error.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_CONFIG_NOGLOBAL and GIT_CONFIG_NOSYSTEM environment
variables are magic undocumented switches that can be used
to ensure a totally clean environment. This is necessary for
running reliable tests, since those config files may contain
settings that change the outcome of tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
A pattern "foo/" in the exclude list did not match directory
"foo", but a pattern "foo" did. This attempts to extend the
exclude mechanism so that it would while not matching a regular
file or a symbolic link "foo". In order to differentiate a
directory and non directory, this passes down the type of path
being checked to excluded() function.
A downside is that the recursive directory walk may need to run
lstat(2) more often on systems whose "struct dirent" do not give
the type of the entry; earlier it did not have to do so for an
excluded path, but we now need to figure out if a path is a
directory before deciding to exclude it. This is especially bad
because an idea similar to the earlier CE_UPTODATE optimization
to reduce number of lstat(2) calls would by definition not apply
to the codepaths involved, as (1) directories will not be
registered in the index, and (2) excluded paths will not be in
the index anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new error mode allows a line to have a carriage return at the
end of the line when checking and fixing trailing whitespace errors.
Some people like to keep CRLF line ending recorded in the repository,
and still want to take advantage of the automated trailing whitespace
stripping. We still show ^M in the diff output piped to "less" to
remind them that they do have the CR at the end, but these carriage
return characters at the end are no longer flagged as errors.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This creates a hash index of every single file added to the index.
Right now that hash index isn't actually used for much: I implemented a
"cache_name_exists()" function that uses it to efficiently look up a
filename in the index without having to do the O(logn) binary search,
but quite frankly, that's not why this patch is interesting.
No, the whole and only reason to create the hash of the filenames in the
index is that by modifying the hash function, you can fairly easily do
things like making it always hash equivalent names into the same bucket.
That, in turn, means that suddenly questions like "does this name exist
in the index under an _equivalent_ name?" becomes much much cheaper.
Guiding principles behind this patch:
- it shouldn't be too costly. In fact, my primary goal here was to
actually speed up "git commit" with a fully populated kernel tree, by
being faster at checking whether a file already existed in the index. I
did succeed, but only barely:
Best before:
[torvalds@woody linux]$ time git commit > /dev/null
real 0m0.255s
user 0m0.168s
sys 0m0.088s
Best after:
[torvalds@woody linux]$ time ~/git/git commit > /dev/null
real 0m0.233s
user 0m0.144s
sys 0m0.088s
so some things are actually faster (~8%).
Caveat: that's really the best case. Other things are invariably going
to be slightly slower, since we populate that index cache, and quite
frankly, few things really use it to look things up.
That said, the cost is really quite small. The worst case is probably
doing a "git ls-files", which will do very little except puopulate the
index, and never actually looks anything up in it, just lists it.
Before:
[torvalds@woody linux]$ time git ls-files > /dev/null
real 0m0.016s
user 0m0.016s
sys 0m0.000s
After:
[torvalds@woody linux]$ time ~/git/git ls-files > /dev/null
real 0m0.021s
user 0m0.012s
sys 0m0.008s
and while the thing has really gotten relatively much slower, we're
still talking about something almost unmeasurable (eg 5ms). And that
really should be pretty much the worst case.
So we lose 5ms on one "benchmark", but win 22ms on another. Pick your
poison - this patch has the advantage that it will _likely_ speed up
the cases that are complex and expensive more than it slows down the
cases that are already so fast that nobody cares. But if you look at
relative speedups/slowdowns, it doesn't look so good.
- It should be simple and clean
The code may be a bit subtle (the reasons I do hash removal the way I
do etc), but it re-uses the existing hash.c files, so it really is
fairly small and straightforward apart from a few odd details.
Now, this patch on its own doesn't really do much, but I think it's worth
looking at, if only because if done correctly, the name hashing really can
make an improvement to the whole issue of "do we have a filename that
looks like this in the index already". And at least it gets real testing
by being used even by default (ie there is a real use-case for it even
without any insane filesystems).
NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just
not ashamed of it enough to really care. I took all the numbers out of my
nether regions - I'm sure it's good enough that it works in practice, but
the whole point was that you can make a really much fancier hash that
hashes characters not directly, but by their upper-case value or something
like that, and thus you get a case-insensitive hash, while still keeping
the name and the index itself totally case sensitive.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aside from the lstat(2) done for work tree files, there are
quite many lstat(2) calls in refname dwimming codepath. This
patch is not about reducing them.
* It adds a new ce_flag, CE_UPTODATE, that is meant to mark the
cache entries that record a regular file blob that is up to
date in the work tree. If somebody later walks the index and
wants to see if the work tree has changes, they do not have
to be checked with lstat(2) again.
* fill_stat_cache_info() marks the cache entry it just added
with CE_UPTODATE. This has the effect of marking the paths
we write out of the index and lstat(2) immediately as "no
need to lstat -- we know it is up-to-date", from quite a lot
fo callers:
- git-apply --index
- git-update-index
- git-checkout-index
- git-add (uses add_file_to_index())
- git-commit (ditto)
- git-mv (ditto)
* refresh_cache_ent() also marks the cache entry that are clean
with CE_UPTODATE.
* write_index is changed not to write CE_UPTODATE out to the
index file, because CE_UPTODATE is meant to be transient only
in core. For the same reason, CE_UPDATE is not written to
prevent an accident from happening.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags(). This can make us store incorrect length.
Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen. Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.
This redefines the meaning of the name length stored in the
cache_entry. A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.
In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.
This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fast-import was relying on the fact that on most systems mmap() and
write() are synchronized by the filesystem's buffer cache. We were
relying on the ability to mmap() 20 bytes beyond the current end
of the file, then later fill in those bytes with a future write()
call, then read them through the previously obtained mmap() address.
This isn't always true with some implementations of NFS, but it is
especially not true with our NO_MMAP=YesPlease build time option used
on some platforms. If fast-import was built with NO_MMAP=YesPlease
we used the malloc()+pread() emulation and the subsequent write()
call does not update the trailing 20 bytes of a previously obtained
"mmap()" (aka malloc'd) address.
Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
be unable to read an object header (or data) that has been unlucky
enough to be written to the packfile at a location such that it
is in the trailing 20 bytes of a window previously opened on that
same packfile.
This bug has gone unnoticed for a very long time as it is highly data
dependent. Not only does the object have to be placed at the right
position, but it also needs to be positioned behind some other object
that has been accessed due to a branch cache invalidation. In other
words the stars had to align just right, and if you did run into
this bug you probably should also have purchased a lottery ticket.
Fortunately the workaround is a lot easier than the bug explanation.
Before we allow unpack_entry() to read data from a pack window
that has also (possibly) been modified through write() we force
all existing windows on that packfile to be closed. By closing
the windows we ensure that any new access via the emulated mmap()
will reread the packfile, updating to the current file content.
This comes at a slight performance degredation as we cannot reuse
previously cached windows when we update the packfile. But it
is a fairly minor difference as the window closes happen at only
two points:
- When the packfile is finalized and its .idx is generated:
At this stage we are getting ready to update the refs and any
data access into the packfile is going to be random, and is
going after only the branch tips (to ensure they are valid).
Our existing windows (if any) are not likely to be positioned
at useful locations to access those final tip commits so we
probably were closing them before anyway.
- When the branch cache missed and we need to reload:
At this point fast-import is getting change commands for the next
commit and it needs to go re-read a tree object it previously
had written out to the packfile. What windows we had (if any)
are not likely to cover the tree in question so we probably were
closing them before anyway.
We do try to avoid unnecessarily closing windows in the second case
by checking to see if the packfile size has increased since the
last time we called unpack_entry() on that packfile. If the size
has not changed then we have not written additional data, and any
existing window is still vaild. This nicely handles the cases where
fast-import is going through a branch cache reload and needs to read
many trees at once. During such an event we are not likely to be
updating the packfile so we do not cycle the windows between reads.
With this change in place t9301-fast-export.sh (which was broken
by c3b0dec509) finally works again.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lockfile API is a handy way to obtain a file that is cleaned
up if you die(). But sometimes you would need this sequence to
work:
1. hold_lock_file_for_update() to get a file descriptor for
writing;
2. write the contents out, without being able to decide if the
results should be committed or rolled back;
3. do something else that makes the decision --- and this
"something else" needs the lockfile not to have an open file
descriptor for writing (e.g. Windows do not want a open file
to be renamed);
4. call commit_lock_file() or rollback_lock_file() as
appropriately.
This adds close_lock_file() you can call between step 2 and 3 in
the above sequence.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit unifies three separate places where whitespace checking was
performed:
- the whitespace checking previously done in builtin-apply.c is
extracted into a function in ws.c
- the equivalent logic in "git diff" is removed
- the emit_line_with_ws() function is also removed because that also
rechecks the whitespace, and its functionality is rolled into ws.c
The new function is called check_and_emit_line() and it does two things:
checks a line for whitespace errors and optionally emits it. The checking
is based on lines of content rather than patch lines (in other words, the
caller must strip the leading "+" or "-"); this was suggested by Junio on
the mailing list to allow for a future extension to "git show" to display
whitespace errors in blobs.
At the same time we teach it to report all classes of whitespace errors
found for a given line rather than reporting only the first found error.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When deciding whether or not to turn on automatic color
support, git_config_colorbool checks whether stdout is a
tty. However, because we run a pager, if stdout is not a
tty, we must check whether it is because we started the
pager. This used to be done by checking the pager_in_use
variable.
This variable was set only when the git program being run
started the pager; there was no way for an external program
running git indicate that it had already started a pager.
This patch allows a program to set GIT_PAGER_IN_USE to a
true value to indicate that even though stdout is not a tty,
it is because a pager is being used.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/spht:
Use gitattributes to define per-path whitespace rule
core.whitespace: documentation updates.
builtin-apply: teach whitespace_rules
builtin-apply: rename "whitespace" variables and fix styles
core.whitespace: add test for diff whitespace error highlighting
git-diff: complain about >=8 consecutive spaces in initial indent
War on whitespace: first, a bit of retreat.
Conflicts:
cache.h
config.c
diff.c
An earlier fix to the said commit was incomplete; it mixed up the
meaning of the flag parameter passed to the internal fmt_ident()
function, so this corrects it.
git_author_info() and git_committer_info() can be told to issue a
warning when no usable user information is found, and optionally can be
told to error out. Operations that actually use the information to
record a new commit or a tag will still error out, but the caller to
leave reflog record will just silently use bogus user information.
Not warning on misconfigured user information while writing a reflog
entry is somewhat debatable, but it is probably nicer to the users to
silently let it pass, because the only information you are losing is who
checked out the branch.
* git_author_info() and git_committer_info() used to take 1 (positive
int) to error out with a warning on misconfiguration; this is now
signalled with a symbolic constant IDENT_ERROR_ON_NO_NAME.
* These functions used to take -1 (negative int) to warn but continue;
this is now signalled with a symbolic constant IDENT_WARN_ON_NO_NAME.
* fmt_ident() function implements the above error reporting behaviour
common to git_author_info() and git_committer_info(). A symbolic
constant IDENT_NO_DATE can be or'ed in to the flag parameter to make
it return only the "Name <email@address.xz>".
* fmt_name() is a thin wrapper around fmt_ident() that always passes
IDENT_ERROR_ON_NO_NAME and IDENT_NO_DATE.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `core.whitespace` configuration variable allows you to define what
`diff` and `apply` should consider whitespace errors for all paths in
the project (See gitlink:git-config[1]). This attribute gives you finer
control per path.
For example, if you have these in the .gitattributes:
frotz whitespace
nitfol -whitespace
xyzzy whitespace=-trailing
all types of whitespace problems known to git are noticed in path 'frotz'
(i.e. diff shows them in diff.whitespace color, and apply warns about
them), no whitespace problem is noticed in path 'nitfol', and the
default types of whitespace problems except "trailing whitespace" are
noticed for path 'xyzzy'. A project with mixed Python and C might want
to have:
*.c whitespace
*.py whitespace=-indent-with-non-tab
in its toplevel .gitattributes file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kh/commit: (33 commits)
git-commit --allow-empty
git-commit: Allow to amend a merge commit that does not change the tree
quote_path: fix collapsing of relative paths
Make git status usage say git status instead of git commit
Fix --signoff in builtin-commit differently.
git-commit: clean up die messages
Do not generate full commit log message if it is not going to be used
Remove git-status from list of scripts as it is builtin
Fix off-by-one error when truncating the diff out of the commit message.
builtin-commit.c: export GIT_INDEX_FILE for launch_editor as well.
Add a few more tests for git-commit
builtin-commit: Include the diff in the commit message when verbose.
builtin-commit: fix partial-commit support
Fix add_files_to_cache() to take pathspec, not user specified list of files
Export three helper functions from ls-files
builtin-commit: run commit-msg hook with correct message file
builtin-commit: do not color status output shown in the message template
file_exists(): dangling symlinks do exist
Replace "runstatus" with "status" in the tests
t7501-commit: Add test for git commit <file> with dirty index.
...
* sp/refspec-match:
refactor fetch's ref matching to use refname_match()
push: use same rules as git-rev-parse to resolve refspecs
add refname_match()
push: support pushing HEAD to real branch name
Now that str_buf takes care of all the allocations, there is
no more gain to pass an argument count.
So this patch removes the "count" argument from:
- "sq_quote_argv"
- "trace_argv_printf"
and all the callers.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce fmt_name() specifically meant for formatting the name and
email pair, to add signed-off-by value. This reverts parts of
13208572fb (builtin-commit: fix --signoff)
so that an empty datestamp string given to fmt_ident() by mistake will
error out as before.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we consider if a path has been totally rewritten, we did not
touch changes from symlinks to files or vice versa. But a change
that modifies even the type of a blob surely should count as a
complete rewrite.
While we are at it, modernise diffcore-break to be aware of gitlinks (we
do not want to touch them).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/send-pack: (24 commits)
send-pack: cluster ref status reporting
send-pack: fix "everything up-to-date" message
send-pack: tighten remote error reporting
make "find_ref_by_name" a public function
Fix warning about bitfield in struct ref
send-pack: assign remote errors to each ref
send-pack: check ref->status before updating tracking refs
send-pack: track errors for each ref
git-push: add documentation for the newly added --mirror mode
Add tests for git push'es mirror mode
Update the tracking references only if they were succesfully updated on remote
Add a test checking if send-pack updated local tracking branches correctly
git-push: plumb in --mirror mode
Teach send-pack a mirror mode
send-pack: segfault fix on forced push
Reteach builtin-ls-remote to understand remotes
send-pack: require --verbose to show update of tracking refs
receive-pack: don't mention successful updates
more terse push output
Build in ls-remote
...
This separates the logic to limit the extent of change to the
index by where you are (controlled by "prefix") and what you
specify from the command line (controlled by "pathspec").
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This exports three helper functions from ls-files.
* pathspec_match() checks if a given path matches a set of pathspecs
and optionally records which pathspec was used. This function used
to be called "match()" but renamed to be a bit less vague.
* report_path_error() takes a set of pathspecs and the record
pathspec_match() above leaves, and gives error message. This
was split out of the main function of ls-files.
* overlay_tree_on_cache() takes a tree-ish (typically "HEAD")
and overlays it on the current in-core index. By iterating
over the resulting index, the caller can find out the paths
in either the index or the HEAD. This function used to be
called "overlay_tree()" but renamed to be a bit more
descriptive.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old rules used by fetch were coded as a series of ifs. The old
rules are:
1) match full refname if it starts with "refs/" or matches "HEAD"
2) verify that full refname starts with "refs/"
3) match abbreviated name in "refs/" if it starts with "heads/",
"tags/", or "remotes/".
4) match abbreviated name in "refs/heads/"
This is replaced by the new rules
a) match full refname
b) match abbreviated name prefixed with "refs/"
c) match abbreviated name prefixed with "refs/heads/"
The details of the new rules are different from the old rules. We no
longer verify that the full refname starts with "refs/". The new rule
(a) matches any full string. The old rules (1) and (2) were stricter.
Now, the caller is responsible for using sensible full refnames. This
should be the case for the current code. The new rule (b) is less
strict than old rule (3). The new rule accepts abbreviated names that
start with a non-standard prefix below "refs/".
Despite this modifications the new rules should handle all cases as
expected. Two tests are added to verify that fetch does not resolve
short tags or HEAD in remotes.
We may even think about loosening the rules a bit more and unify them
with the rev-parse rules. This would be done by replacing
ref_ref_fetch_rules with ref_ref_parse_rules. Note, the two new test
would break.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use at least two rulesets for matching abbreviated refnames with
full refnames (starting with 'refs/'). git-rev-parse and git-fetch
use slightly different rules.
This commit introduces a new function refname_match
(const char *abbrev_name, const char *full_name, const char **rules).
abbrev_name is expanded using the rules and matched against full_name.
If a match is found the function returns true. rules is a NULL-terminate
list of format patterns with "%.*s", for example:
const char *ref_rev_parse_rules[] = {
"%.*s",
"refs/%.*s",
"refs/tags/%.*s",
"refs/heads/%.*s",
"refs/remotes/%.*s",
"refs/remotes/%.*s/HEAD",
NULL
};
Asterisks are included in the format strings because this is the form
required in sha1_name.c. Sharing the list with the functions there is
a good idea to avoid duplicating the rules. Hopefully this
facilitates unified matching rules in the future.
This commit makes the rules used by rev-parse for resolving refs to
sha1s available for string comparison. Before this change, the rules
were buried in get_sha1*() and dwim_ref().
A follow-up commit will refactor the rules used by fetch.
refname_match() will be used for matching refspecs in git-send-pack.
Thanks to Daniel Barkalow <barkalow@iabervon.org> for pointing
out that ref_matches_abbrev in remote.c solves a similar problem
and care should be taken to avoid confusion.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, we set all ref pushes to 'OK', and then marked
them as errors if the remote reported so. This has the
problem that if the remote dies or fails to report a ref, we
just assume it was OK.
Instead, we use a new non-OK state to indicate that we are
expecting status (if the remote doesn't support the
report-status feature, we fall back on the old behavior).
Thus we can flag refs for which we expected a status, but
got none (conversely, we now also print a warning for refs
for which we get a status, but weren't expecting one).
This also allows us to simplify the receive_status exit
code, since each ref is individually marked with failure
until we get a success response. We can just print the usual
status table, so the user still gets a sense of what we were
trying to do when the failure happened.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache.h:503: warning: type of bit-field 'force' is a GCC extension
cache.h:504: warning: type of bit-field 'merge' is a GCC extension
cache.h:505: warning: type of bit-field 'nonfastforward' is a GCC extension
cache.h:506: warning: type of bit-field 'deletion' is a GCC extension
So we change it to an 'unsigned int' which is not a GCC extension.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This lets us show remote errors (e.g., a denied hook) along
with the usual push output.
There is a slightly clever optimization in receive_status
that bears explanation. We need to correlate the returned
status and our ref objects, which naively could be an O(m*n)
operation. However, since the current implementation of
receive-pack returns the errors to us in the same order that
we sent them, we optimistically look for the next ref to be
looked up to come after the last one we have found. So it
should be an O(m+n) merge if the receive-pack behavior
holds, but we fall back to a correct but slower behavior if
it should change.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of keeping the 'ret' variable, we instead have a
status flag for each ref that tracks what happened to it.
We then print the ref status after all of the refs have
been examined.
This paves the way for three improvements:
- updating tracking refs only for non-error refs
- incorporating remote rejection into the printed status
- printing errors in a different order than we processed
(e.g., consolidating non-ff errors near the end with
a special message)
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent patch the path to the system-wide config file will be
computed. This is a preparation for that change. It turns all accesses
of ETC_GITCONFIG into function calls. There is no change in behavior.
As a consequence, config.c is the only file that needs the definition of
ETC_GITCONFIG. Hence, -DETC_GITCONFIG is removed from the CFLAGS and a
special build rule for config.c is introduced. As a side-effect, changing
the defintion of ETC_GITCONFIG (e.g. in config.mak) does not trigger a
complete rebuild anymore.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed on Windows since open files cannot be unlinked.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are inconsistencies in the way commands currently handle
the core.excludesfile configuration variable. The problem is
the variable is too new to be noticed by anything other than
git-add and git-status.
* git-ls-files does not notice any of the "ignore" files by
default, as it predates the standardized set of ignore files.
The calling scripts established the convention to use
.git/info/exclude, .gitignore, and later core.excludesfile.
* git-add and git-status know about it because they call
add_excludes_from_file() directly with their own notion of
which standard set of ignore files to use. This is just a
stupid duplication of code that need to be updated every time
the definition of the standard set of ignore files is
changed.
* git-read-tree takes --exclude-per-directory=<gitignore>,
not because the flexibility was needed. Again, this was
because the option predates the standardization of the ignore
files.
* git-merge-recursive uses hardcoded per-directory .gitignore
and nothing else. git-clean (scripted version) does not
honor core.* because its call to underlying ls-files does not
know about it. git-clean in C (parked in 'pu') doesn't either.
We probably could change git-ls-files to use the standard set
when no excludes are specified on the command line and ignore
processing was asked, or something like that, but that will be a
change in semantics and might break people's scripts in a subtle
way. I am somewhat reluctant to make such a change.
On the other hand, I think it makes perfect sense to fix
git-read-tree, git-merge-recursive and git-clean to follow the
same rule as other commands. I do not think of a valid use case
to give an exclude-per-directory that is nonstandard to
read-tree command, outside a "negative" test in the t1004 test
script.
This patch is the first step to untangle this mess.
The next step would be to teach read-tree, merge-recursive and
clean (in C) to use setup_standard_excludes().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-add-sync-stat:
t2200: test more cases of "add -u"
git-add: make the entry stat-clean after re-adding the same contents
ce_match_stat, run_diff_files: use symbolic constants for readability
Conflicts:
builtin-add.c
* db/remote-builtin:
Reteach builtin-ls-remote to understand remotes
Build in ls-remote
Use built-in send-pack.
Build-in send-pack, with an API for other programs to call.
Build-in peek-remote, using transport infrastructure.
Miscellaneous const changes and utilities
Conflicts:
transport.c
ce_match_stat() can be told:
(1) to ignore CE_VALID bit (used under "assume unchanged" mode)
and perform the stat comparison anyway;
(2) not to perform the contents comparison for racily clean
entries and report mismatch of cached stat information;
using its "option" parameter. Give them symbolic constants.
Similarly, run_diff_files() can be told not to report anything
on removed paths. Also give it a symbolic constant for that.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ZLIB_VERNUM isn't defined in some zlib versions, so this patch does a proper
linking test in autoconf to see whether deflateBound exists in zlib. Also,
setting NO_DEFLATE_BOUND will also work for folk not using autoconf.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a setup_work_tree() that can be used from any command requiring
a working tree conditionally.
Signed-off-by: Mike Hommey <mh@glandium.org>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list of remote refs in struct transport should be const, because
builtin-fetch will get confused if it changes.
The url in git_connect should be const (and work on a copy) instead of
requiring the caller to copy it.
match_refs doesn't modify the refspecs it gets.
get_fetch_map and get_remote_ref don't change the list they get.
Allow transport get_refs_list methods to modify the struct transport.
Add a function to copy a list of refs, when a function needs a mutable
copy of a const list.
Add a function to check the type of a ref, as per the code in connect.c
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces a new whitespace error type, "indent-with-non-tab".
The error is about starting a line with 8 or more SP, instead of
indenting it with a HT.
This is not enabled by default, as some projects employ an
indenting policy to use only SPs and no HTs.
The kernel folks and git contributors may want to enable this
detection with:
[core]
whitespace = indent-with-non-tab
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces core.whitespace configuration variable that lets
you specify the definition of "whitespace error".
Currently there are two kinds of whitespace errors defined:
* trailing-space: trailing whitespaces at the end of the line.
* space-before-tab: a SP appears immediately before HT in the
indent part of the line.
You can specify the desired types of errors to be detected by
listing their names (unique abbreviations are accepted)
separated by comma. By default, these two errors are always
detected, as that is the traditional behaviour. You can disable
detection of a particular type of error by prefixing a '-' in
front of the name of the error, like this:
[core]
whitespace = -trailing-space
This patch teaches the code to output colored diff with
DIFF_WHITESPACE color to highlight the detected whitespace
errors to honor the new configuration.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares the API of git_connect() and finish_connect() to operate on
a struct child_process. Currently, we just use that object as a placeholder
for the pid that we used to return. A follow-up patch will change the
implementation of git_connect() and finish_connect() to make full use
of the object.
Old code had early-return-on-error checks at the calling sites of
git_connect(), but since git_connect() dies on errors anyway, these checks
were removed.
[sp: Corrected style nit of "conn == NULL" to "!conn"]
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There's a number of tricky conflicts between master and
this topic right now due to the rewrite of builtin-push.
Junio must have handled these via rerere; I'd rather not
deal with them again so I'm pre-merging master into the
topic. Besides this topic somehow started to depend on
the strbuf series that was in next, but is now in master.
It no longer compiles on its own without the strbuf API.
* master: (184 commits)
Whip post 1.5.3.4 maintenance series into shape.
Minor usage update in setgitperms.perl
manual: use 'URL' instead of 'url'.
manual: add some markup.
manual: Fix example finding commits referencing given content.
Fix wording in push definition.
Fix some typos, punctuation, missing words, minor markup.
manual: Fix or remove em dashes.
Add a --dry-run option to git-push.
Add a --dry-run option to git-send-pack.
Fix in-place editing functions in convert.c
instaweb: support for Ruby's WEBrick server
instaweb: allow for use of auto-generated scripts
Add 'git-p4 commit' as an alias for 'git-p4 submit'
hg-to-git speedup through selectable repack intervals
git-svn: respect Subversion's [auth] section configuration values
gtksourceview2 support for gitview
fix contrib/hooks/post-receive-email hooks.recipients error message
Support cvs via git-shell
rebase -i: use diff plumbing instead of porcelain
...
Conflicts:
Makefile
builtin-push.c
rsh.c
* ph/strbuf: (44 commits)
Make read_patch_file work on a strbuf.
strbuf_read_file enhancement, and use it.
strbuf change: be sure ->buf is never ever NULL.
double free in builtin-update-index.c
Clean up stripspace a bit, use strbuf even more.
Add strbuf_read_file().
rerere: Fix use of an empty strbuf.buf
Small cache_tree_write refactor.
Make builtin-rerere use of strbuf nicer and more efficient.
Add strbuf_cmp.
strbuf_setlen(): do not barf on setting length of an empty buffer to 0
sq_quote_argv and add_to_string rework with strbuf's.
Full rework of quote_c_style and write_name_quoted.
Rework unquote_c_style to work on a strbuf.
strbuf API additions and enhancements.
nfv?asprintf are broken without va_copy, workaround them.
Fix the expansion pattern of the pseudo-static path buffer.
builtin-for-each-ref.c::copy_name() - do not overstep the buffer.
builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion.
Use xmemdupz() in many places.
...
* jc/autogc:
git-gc --auto: run "repack -A -d -l" as necessary.
git-gc --auto: restructure the way "repack" command line is built.
git-gc --auto: protect ourselves from accumulated cruft
git-gc --auto: add documentation.
git-gc --auto: move threshold check to need_to_gc() function.
repack -A -d: use --keep-unreachable when repacking
pack-objects --keep-unreachable
Export matches_pack_name() and fix its return value
Invoke "git gc --auto" from commit, merge, am and rebase.
Implement git gc --auto
Factor out the code to parse --date=<format> parameter to revision
walkers into a separate function, parse_date_format(). This function
is passed a string and converts it to an enum date_format:
- "relative" => DATE_RELATIVE
- "iso8601" or "iso" => DATE_ISO8601
- "rfc2822" => DATE_RFC2822
- "short" => DATE_SHORT
- "local" => DATE_LOCAL
- "default" => DATE_NORMAL
In the event that none of these strings is found, the function die()s.
Signed-off-by: Andy Parkins <andyparkins@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function make_cache_entry() is too useful to be hidden away in
merge-recursive. So move it to libgit.a (exposing it via cache.h).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* drop nfasprintf.
* move nfvasprintf into imap-send.c back, and let it work on a 8k buffer,
and die() in case of overflow. It should be enough for imap commands, if
someone cares about imap-send, he's welcomed to fix it properly.
* replace nfvasprintf use in merge-recursive with a copy of the strbuf_addf
logic, it's one place, we'll live with it.
To ease the change, output_buffer string list is replaced with a strbuf ;)
* rework trace.c to call vsnprintf itself. It's used to format strerror()s
and git command names, it should never be more than a few octets long, let
it work on a 8k static buffer with vsnprintf or die loudly.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Thanks to Johannes Schindelin for review and fixes, and Julian
Phillips for the original C translation.
This changes a few small bits of behavior:
branch.<name>.merge is parsed as if it were the lhs of a fetch
refspec, and does not have to exactly match the actual lhs of a
refspec, so long as it is a valid abbreviation for the same ref.
branch.<name>.merge is no longer ignored if the remote is configured
with a branches/* file. Neither behavior is useful, because there can
only be one ref that gets fetched, but this is more consistant.
Also, fetch prints different information to standard out.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* master: (94 commits)
Fixed update-hook example allow-users format.
Documentation/git-svn: updated design philosophy notes
t/t4014: test "am -3" with mode-only change.
git-commit.sh: Shell script cleanup
preserve executable bits in zip archives
Fix lapsus in builtin-apply.c
git-push: documentation and tests for pushing only branches
git-svnimport: Use separate arguments in the pipe for git-rev-parse
contrib/fast-import: add perl version of simple example
contrib/fast-import: add simple shell example
rev-list --bisect: Bisection "distance" clean up.
rev-list --bisect: Move some bisection code into best_bisection.
rev-list --bisect: Move finding bisection into do_find_bisection.
Document ls-files --with-tree=<tree-ish>
git-commit: partial commit of paths only removed from the index
git-commit: Allow partial commit of file removal.
send-email: make message-id generation a bit more robust
git-apply: fix whitespace stripping
git-gui: Disable native platform text selection in "lists"
apply --index-info: fall back to current index for mode changes
...
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function make_cache_entry() is too useful to be hidden away in
merge-recursive. So move it to libgit.a (exposing it via cache.h).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This brings builtin-stripspace, builtin-tag and mktag to use strbufs.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert_sha1_file() became unused by the previous patch -- remove it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning message to suggest "Consider running git-status" from
"git-diff" that we experimented with during the 1.5.3 cycle turns
out to be a bad idea. It robbed cache-dirty information from people
who valued it, while still asking users to run "update-index --refresh".
It was hoped that the new behaviour would at least have some educational
value, but not showing the cache-dirty paths like before meant that the
user would not even know easily which paths were cache-dirty, and it
made the need to refresh the index look like even more unnecessary chore.
This commit reinstates the traditional behaviour, but with a twist.
By default, the empty "diff --git" output is totally squelched out
from "git diff" output. At the end of the command, it automatically
runs "update-index --refresh" as needed, without even bothering the
user. In other words, people who do not care about the cache-dirtyness
do not even have to see the warning.
The traditional behaviour to see the stat-dirty output and to bypassing
the overhead of content comparison can be specified by setting the
configuration variable diff.autorefreshindex to false.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows to refresh only a subset of the project files, based on
the specified pathspecs.
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cr/tag:
Teach "git stripspace" the --strip-comments option
Make verify-tag a builtin.
builtin-tag.c: Fix two memory leaks and minor notation changes.
launch_editor(): Heed GIT_EDITOR and core.editor settings
Make git tag a builtin.
The read_tree() function is called only from the call chain to
run "git diff --cached" (this includes the internal call made by
git-runstatus to run_diff_index()). The function vacates stage
without any funky "merge" magic. The caller then goes and
compares stage #1 entries from the tree with stage #0 entries
from the original index.
When adding the cache entries this way, it used the general
purpose add_cache_entry(). This function looks for an existing
entry to replace or if there is none to find where to insert the
new entry, resolves D/F conflict and all the other things.
For the purpose of reading entries into an empty stage, none of
that processing is needed. We can instead append everything and
then sort the result at the end.
This commit changes read_tree() to first make sure that there is
no existing cache entries at specified stage, and if that is the
case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND flag
(new), and then sort the resulting cache using qsort().
This new flag tells add_cache_entry() to omit all the checks
such as "Does this path already exist? Does adding this path
remove other existing entries because it turns a directory to a
file?" and instead append the given cache entry straight at the
end of the active cache. The caller of course is expected to
sort the resulting cache at the end before using the result.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old version of work-tree support was an unholy mess, barely readable,
and not to the point.
For example, why do you have to provide a worktree, when it is not used?
As in "git status". Now it works.
Another riddle was: if you can have work trees inside the git dir, why
are some programs complaining that they need a work tree?
IOW it is allowed to call
$ git --git-dir=../ --work-tree=. bla
when you really want to. In this case, you are both in the git directory
and in the working tree. So, programs have to actually test for the right
thing, namely if they are inside a working tree, and not if they are
inside a git directory.
Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was
specified, unless there is a repository in the current working directory.
It does now.
The logic to determine if a repository is bare, or has a work tree
(tertium non datur), is this:
--work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true,
which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR
ends in /.git, which overrides the directory in which .git/ was found.
In related news, a long standing bug was fixed: when in .git/bla/x.git/,
which is a bare repository, git formerly assumed ../.. to be the
appropriate git dir. This problem was reported by Shawn Pearce to have
caused much pain, where a colleague mistakenly ran "git init" in "/" a
long time ago, and bare repositories just would not work.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the function set_git_dir() you can reset the path that will
be used for git_path(), git_dir() and friends.
The responsibility to close files and throw away information from the
old git_dir lies with the caller.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds convenience functions to work with absolute paths.
The function is_absolute_path() should help the efforts to integrate
the MinGW fork.
Note that make_absolute_path() returns a pointer to a static buffer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the commit 'Add GIT_EDITOR environment and core.editor
configuration variables', this was done for the shell scripts.
Port it over to builtin-tag's version of launch_editor(), which
is just about to be refactored into editor.c.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new name is closer to the purpose of the function.
A NUL-terminated buffer makes things easier when callers need that.
Since the function returns only the memory written with data,
almost always allocating more space than needed because final
size is unknown, an extra NUL terminating the buffer is harmless.
It is not included in the returned size, so the function
remains working as before.
Also, now the function allows the buffer passed to be NULL at first,
and alloc_nr is now used for growing the buffer, instead size=*2.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These days, show_date() takes a date_mode parameter to specify
the output format, and a separate specialized function for dates
in E-mails does not make much sense anymore.
This retires show_rfc2822_date() function and make it just
another date output format.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support output of full ISO 8601 style dates in e.g. git log
and other places that use interpolation for formatting.
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out the nnn{k,m,g} parsing code from git_config_int into
git_parse_long, so command-line parameters can enjoy the same
functionality. Also add get_parse_ulong for unsigned values.
Make git_config_int use git_parse_long, and add get_config_ulong
as well.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a configuration variable that performs the same function as,
but is overridden by, GIT_PAGER.
Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
Acked-by: Johannes E. Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ei/worktree+filter:
filter-branch: always export GIT_DIR if it is set
setup_git_directory: fix segfault if repository is found in cwd
test GIT_WORK_TREE
extend rev-parse test for --is-inside-work-tree
Use new semantics of is_bare/inside_git_dir/inside_work_tree
introduce GIT_WORK_TREE to specify the work tree
test git rev-parse
rev-parse: introduce --is-bare-repository
rev-parse: document --is-inside-git-dir
This patch arose from a discussion started by Jim Meyering's patch
whose intention was to provide better diagnostics for failed writes.
Linus proposed a better way to do things, which also had the added
benefit that adding a fflush() to git-log-* operations and incremental
git-blame operations could improve interactive respose time feel, at
the cost of making things a bit slower when we aren't piping the
output to a downstream program.
This patch skips the fflush() calls when stdout is a regular file, or
if the environment variable GIT_FLUSH is set to "0". This latter can
speed up a command such as:
GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l
a tiny amount.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We always quote "unusual" byte values in a pathname using
C-string style, to make it safer for parsing scripts that do not
handle NUL separated records well (or just too lazy to bother).
The absolute minimum bytes that need to be quoted for this
purpose are TAB, LF (and other control characters), double quote
and backslash.
However, we have also always quoted the bytes in high 8-bit
range; this was partly because we were lazy and partly because
we were being cautious.
This introduces an internal "quote_path_fully" variable, and
core.quotepath configuration variable to control it. When set
to false, it does not quote bytes in high 8-bit range anymore
but passes them intact.
The variable defaults to "true" to retain the traditional
behaviour for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ALLOC_GROW macro will never let us fill the array completely,
instead allocating an extra chunk if that would be the case. This is
because the 'nr' argument was originally treated as "how much we do have
now" instead of "how much do we want". The latter makes much more
sense because you can grow by more than one item.
This off-by-one never resulted in an error because it meant we were
overly conservative about when to allocate. Any callers which passed
"how much we have now" need to be updated, or they will fail to allocate
enough.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
so that an ugly commit message like this can be
handled sanely.
Currently, --pretty=oneline and --pretty=email (hence
format-patch) take and use only the first line of the commit log
message. This changes them to:
- Take the first paragraph, where the definition of the first
paragraph is "skip all blank lines from the beginning, and
then grab everything up to the next empty line".
- Replace all line breaks with a whitespace.
This change would not affect a well-behaved commit message that
adheres to the convention of "single line summary, a blank line,
and then body of message", as its first paragraph always
consists of a single line. Commit messages from different
culture, such as the ones imported from CVS/SVN, can however get
chomped with the existing behaviour at the first linebreak in
the middle of sentence right now, which would become much easier
to see with this change.
The Subject: and --pretty=oneline output would become very long
and unsightly for non-conforming commits, but their messages are
already ugly anyway, and thischange at least avoids the loss of
information.
The Subject: line from a multi-line paragraph is folded using
RFC2822 line folding rules at the places where line breaks were
in the original.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for keeping two entry lists in the
dir object.
This patch adds and uses the ALLOC_GROW() macro, which
implements the commonly used idiom of growing a dynamic
array using the alloc_nr function (not just in dir.c, but
everywhere).
We also move creation of a dir_entry to dir_entry_new.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup_gdg is used as abbreviation for setup_git_directory_gently.
The work tree can be specified using the environment variable
GIT_WORK_TREE and the config option core.worktree (the environment
variable has precendence over the config option). Additionally
there is a command line option --work-tree which sets the
environment variable.
setup_gdg does the following now:
GIT_DIR unspecified
repository in .git directory
parent directory of the .git directory is used as work tree,
GIT_WORK_TREE is ignored
GIT_DIR unspecified
repository in cwd
GIT_DIR is set to cwd
see the cases with GIT_DIR specified what happens next and
also see the note below
GIT_DIR specified
GIT_WORK_TREE/core.worktree unspecified
cwd is used as work tree
GIT_DIR specified
GIT_WORK_TREE/core.worktree specified
the specified work tree is used
Note on the case where GIT_DIR is unspecified and repository is in cwd:
GIT_WORK_TREE is used but is_inside_git_dir is always true.
I did it this way because setup_gdg might be called multiple
times (e.g. when doing alias expansion) and in successive calls
setup_gdg should do the same thing every time.
Meaning of is_bare/is_inside_work_tree/is_inside_git_dir:
(1) is_bare_repository
A repository is bare if core.bare is true or core.bare is
unspecified and the name suggests it is bare (directory not
named .git). The bare option disables a few protective
checks which are useful with a working tree. Currently
this changes if a repository is bare:
updates of HEAD are allowed
git gc packs the refs
the reflog is disabled by default
(2) is_inside_work_tree
True if the cwd is inside the associated working tree (if there
is one), false otherwise.
(3) is_inside_git_dir
True if the cwd is inside the git directory, false otherwise.
Before this patch is_inside_git_dir was always true for bare
repositories.
When setup_gdg finds a repository git_config(git_default_config) is
always called. This ensure that is_bare_repository makes use of
core.bare and does not guess even though core.bare is specified.
inside_work_tree and inside_git_dir are set if setup_gdg finds a
repository. The is_inside_work_tree and is_inside_git_dir functions
will die if they are called before a successful call to setup_gdg.
Signed-off-by: Matthias Lederhofer <matled@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/pack:
Style nit - don't put space after function names
Ensure the pack index is opened before access
Simplify index access condition in count-objects, pack-redundant
Test for recent rev-parse $abbrev_sha1 regression
rev-parse: Identify short sha1 sums correctly.
Attempt to delay prepare_alt_odb during get_sha1
Micro-optimize prepare_alt_odb
Lazily open pack index files on demand
* maint:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
* maint-1.5.1:
git-config: Improve documentation of git-config file handling
git-config: Various small fixes to asciidoc documentation
decode_85(): fix missing return.
fix signed range problems with hex conversions
Make hexval_table[] "const". Also make sure that the accessor
function hexval() does not access the table with out-of-range
values by declaring its parameter "unsigned char", instead of
"unsigned int".
With this, gcc can just generate:
movzbl (%rdi), %eax
movsbl hexval_table(%rax),%edx
movzbl 1(%rdi), %eax
movsbl hexval_table(%rax),%eax
sall $4, %edx
orl %eax, %edx
for the code to generate a byte from two hex characters.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* db/remote:
Move refspec pattern matching to match_refs().
Update local tracking refs when pushing
Add handlers for fetch-side configuration of remotes.
Move refspec parser from connect.c and cache.h to remote.{c,h}
Move remote parsing into a library file out of builtin-push.
In some repository configurations the user may have many packfiles,
but all of the recent commits/trees/tags/blobs are likely to
be in the most recent packfile (the one with the newest mtime).
It is therefore common to be able to complete an entire operation
by accessing only one packfile, even if there are 25 packfiles
available to the repository.
Rather than opening and mmaping the corresponding .idx file for
every pack found, we now only open and map the .idx when we suspect
there might be an object of interest in there.
Of course we cannot known in advance which packfile contains an
object, so we still need to scan the entire packed_git list to
locate anything. But odds are users want to access objects in the
most recently created packfiles first, and that may be all they
ever need for the current operation.
Junio observed in b867092f that placing recent packfiles before
older ones can slightly improve access times for recent objects,
without degrading it for historical object access.
This change improves upon Junio's observations by trying even harder
to avoid the .idx files that we won't need.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* np/pack:
deprecate the new loose object header format
make "repack -f" imply "pack-objects --no-reuse-object"
allow for undeltified objects not to be reused
As noted by Johan Herland, git-archive is a kind of checkout and needs
to apply any checkout filters that might be configured.
This patch adds the convenience function convert_sha1_file which returns
a buffer containing the object's contents, after converting, if necessary
(i.e. it's a combination of read_sha1_file and convert_to_working_tree).
Direct calls to read_sha1_file in git-archive are then replaced by calls
to convert_sha1_file.
Since convert_sha1_file expects its path argument to be NUL-terminated --
a convention it inherits from convert_to_working_tree -- the patch also
changes the path handling in archive-tar.c to always NUL-terminate the
string. It used to solely rely on the len field of struct strbuf before.
archive-zip.c already NUL-terminates the path and thus needs no such
change.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Make git notify the user about host resolution/connection attempts.
This is useful both as a progress indicator on slow links, and helps
reassure the user there are no firewall problems.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When we are applying a patch that creates a blob at a path, or
when we are switching from a branch that does not have a blob at
the path to another branch that has one, we need to make sure
that there is nothing at the path in the working tree, as such a
file is a local modification made by the user that would be lost
by the operation.
Normally, lstat() on the path and making sure ENOENT is returned
is good enough for that purpose. However there is a twist. We
may be creating a regular file arch/x86_64/boot/Makefile, while
removing an existing symbolic link at arch/x86_64/boot that
points at existing ../i386/boot directory that has Makefile in
it. We always first check without touching filesystem and then
perform the actual operation, so when we verify the new file,
arch/x86_64/boot/Makefile, does not exist, we haven't removed
the symbolic link arc/x86_64/boot symbolic link yet. lstat() on
the file sees through the symbolic link and reports the file is
there, which is not what we want.
The function has_symlink_leading_path() function takes a path,
and sees if any of the leading directory component is a symbolic
link.
When files in a new directory are created, we tend to process
them together because both index and tree are sorted. The
function takes advantage of this and allows the caller to cache
and reuse which symbolic link on the filesystem caused the
function to return true.
The calling sequence would be:
char last_symlink[PATH_MAX];
*last_symlink = '\0';
for each index entry {
if (!lose)
continue;
if (lstat(it))
if (errno == ENOENT)
; /* happy */
else
error;
else if (has_symlink_leading_path(it, last_symlink))
; /* happy */
else
error; /* would lose local changes */
unlink_entry(it, last_symlink);
}
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that we encourage and actively preserve objects in a packed form
more agressively than we did at the time the new loose object format and
core.legacyheaders were introduced, that extra loose object format
doesn't appear to be worth it anymore.
Because the packing of loose objects has to go through the delta match
loop anyway, and since most of them should end up being deltified in
most cases, there is really little advantage to have this parallel loose
object format as the CPU savings it might provide is rather lost in the
noise in the end.
This patch gets rid of core.legacyheaders, preserve the legacy format as
the only writable loose object format and deprecate the other one to
keep things simpler.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date={local,relative,default} option to log family of commands,
to allow displaying timestamps in user's local timezone, relative time, or
the default format.
Existing --relative-date option is a synonym of --date=relative; we could
probably deprecate it in the long run.
Signed-off-by: Junio C Hamano <junkio@cox.net>
get_sha1_with_mode basically behaves as get_sha1. It has an additional
parameter for storing the mode of the object.
If the mode can not be determined, it stores S_IFINVALID.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
S_IFINVALID is used to signal, that no mode information is available.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes all low-level functions defined in read-cache.c to
take an explicit index_state structure as their first parameter,
to specify which index to work on. These functions
traditionally operated on "the_index" and were named foo_cache();
the counterparts this patch introduces are called foo_index().
The traditional foo_cache() functions are made into macros that
give "the_index" to their corresponding foo_index() functions.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This defines a index_state structure and moves index-related
global variables into it. Currently there is one instance of
it, the_index, and everybody accesses it, so there is no code
change.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* 'jc/attr': (28 commits)
lockfile: record the primary process.
convert.c: restructure the attribute checking part.
Fix bogus linked-list management for user defined merge drivers.
Simplify calling of CR/LF conversion routines
Document gitattributes(5)
Update 'crlf' attribute semantics.
Documentation: support manual section (5) - file formats.
Simplify code to find recursive merge driver.
Counto-fix in merge-recursive
Fix funny types used in attribute value representation
Allow low-level driver to specify different behaviour during internal merge.
Custom low-level merge driver: change the configuration scheme.
Allow the default low-level merge driver to be configured.
Custom low-level merge driver support.
Add a demonstration/test of customized merge.
Allow specifying specialized merge-backend per path.
merge-recursive: separate out xdl_merge() interface.
Allow more than true/false to attributes.
Document git-check-attr
Change attribute negation marker from '!' to '-'.
...
* lt/gitlink:
Tests for core subproject support
Expose subprojects as special files to "git diff" machinery
Fix some "git ls-files -o" fallout from gitlinks
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic to not follow gitlinks
Fix gitlink index entry filesystem matching
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic not to follow gitlinks
Don't show gitlink directories when we want "other" files
Teach git-update-index about gitlinks
Teach directory traversal about subprojects
Fix thinko in subproject entry sorting
Teach core object handling functions about gitlinks
Teach "fsck" not to follow subproject links
Add "S_IFDIRLNK" file mode infrastructure for git links
Add 'resolve_gitlink_ref()' helper function
Avoid overflowing name buffer in deep directory structures
diff-lib: use ce_mode_from_stat() rather than messing with modes manually
* np/pack: (27 commits)
document --index-version for index-pack and pack-objects
pack-objects: remove obsolete comments
pack-objects: better check_object() performances
add get_size_from_delta()
pack-objects: make in_pack_header_size a variable of its own
pack-objects: get rid of create_final_object_list()
pack-objects: get rid of reuse_cached_pack
pack-objects: clean up list sorting
pack-objects: rework check_delta_limit usage
pack-objects: equal objects in size should delta against newer objects
pack-objects: optimize preferred base handling a bit
clean up add_object_entry()
tests for various pack index features
use test-genrandom in tests instead of /dev/urandom
simple random data generator for tests
validate reused pack data with CRC when possible
allow forcing index v2 and 64-bit offset treshold
pack-redundant.c: learn about index v2
show-index.c: learn about index v2
sha1_file.c: learn about index version 2
...
The usual process flow is the main process opens and holds the lock to
the index, does its thing, perhaps spawning children during the course,
and then writes the resulting index out by releaseing the lock.
However, the lockfile interface uses atexit(3) to clean it up, without
regard to who actually created the lock. This typically leads to a
confusing behaviour of lock being released too early when the child
exits, and then the parent process when it calls commit_lockfile()
finds that it cannot unlock it.
This fixes the problem by recording who created and holds the lock, and
upon atexit(3) handler, child simply ignores the lockfile the parent
created.
Signed-off-by: Junio C Hamano <junkio@cox.net>
delete_ref function does not change the 'sha1' parameter. Non-const pointer
causes a compiler warning if you call to the function using a const argument.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to use a different allocator scheme for when we didn't know the
object type. That meant that objects that were created without any
up-front knowledge of the type would not go through the same allocation
paths as normal object allocations, and would miss out on the statistics.
But perhaps more importantly than the statistics (that are useful when
looking at memory usage but not much else), if we want to make the
object hash tables use a denser object pointer representation, we need
to make sure that they all go through the same blocking allocator.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
... which consists of existing code split out of packed_delta_info()
for other callers to use it as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:
* diff crlf
*.ps !diff
proprietary.o !diff !crlf
That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:
* diff crlf merge-3way
*.ps !diff
proprietary.o !diff !crlf !merge-3way
The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.
All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".
So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:
[attr] binary !diff !crlf !merge-3way
When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.
It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:
(built-in attribute configuration)
[attr] binary !diff !crlf !merge-3way
* diff crlf merge-3way
(project specific .gitattributes)
proprietary.o binary
(user preference $GIT_DIR/info/attributes)
*.ps !diff
There are a few caveats.
* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.
* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:
[attr] text diff crlf
[attr] ps text !diff
*.ps ps
while this would most likely not (I haven't tried):
[attr] ps text !diff
[attr] text diff crlf
*.ps ps
* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:
[attr] text diff crlf
[attr] ps text !diff
*.txt !ps
path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.
An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.
Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.
A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.
This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
GIT 1.5.1.1
cvsserver: Fix handling of diappeared files on update
fsck: do not complain on detached HEAD.
(encode_85, decode_85): Mark source buffer pointer as "const".
This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.
The next commit will start actually adding actual subproject support.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.
While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This merge strategy largely piggy-backs on git-merge-recursive.
When merging trees A and B, if B corresponds to a subtree of A,
B is first adjusted to match the tree structure of A, instead of
reading the trees at the same level. This adjustment is also
done to the common ancestor tree.
If you are pulling updates from git-gui repository into git.git
repository, the root level of the former corresponds to git-gui/
subdirectory of the latter. The tree object of git-gui's toplevel
is wrapped in a fake tree object, whose sole entry has name 'git-gui'
and records object name of the true tree, before being used by
the 3-way merge code.
If you are merging the other way, only the git-gui/ subtree of
git.git is extracted and merged into git-gui's toplevel.
The detection of corresponding subtree is done by comparing the
pathnames and types in the toplevel of the tree.
Heuristics galore! That's the git way ;-).
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/index-output:
git-read-tree --index-output=<file>
_GIT_INDEX_OUTPUT: allow plumbing to output to an alternative index file.
Conflicts:
builtin-apply.c
This function was not called "add_file_to_cache()" only because
an ancient program, update-cache, used that name as an internal
function name that does something slightly different. Now that
is gone, we can take over the better name.
The plan is to name all functions that operate on the default
index xxx_cache(). Later patches create a variant of them that
take an explicit parameter xxx_index(), and then turn
xxx_cache() functions into macros that use "the_index".
Signed-off-by: Junio C Hamano <junkio@cox.net>
This error message should not usually trigger, but the function
make_cache_entry() called by add_cacheinfo() can return early
without calling into refresh_cache_entry() that sets cache_errno.
Also the error message had a wrong function name reported, and
it did not say anything about which path failed either.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Let's avoid the open coded pack index reference in pack-object and use
nth_packed_object_sha1() instead. This will help encapsulating index
format differences in one place.
And while at it there is no reason to copy SHA1's over and over while a
direct pointer to it in the index will do just fine.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This corrects the interface mistake of the previous one, and
gives a command line parameter to the only plumbing command that
currently needs it: "git-read-tree".
We can add the calls to set_alternate_index_output() to other
plumbing commands that update the index if/when needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When defined, this allows plumbing commands that update the
index (add, apply, checkout-index, merge-recursive, mv,
read-tree, rm, update-index, and write-tree) to write their
resulting index to an alternative index file while holding a
lock to the original index file. With this, git-commit that
jumps the index does not have to make an extra copy of the index
file, and more importantly, it can do the update while holding
the lock on the index.
However, I think the interface to let an environment variable
specify the output is a mistake, as shown in the documentation.
If a curious user has the environment variable set to something
other than the file GIT_INDEX_FILE points at, almost everything
will break. This should instead be a command line parameter to
tell these plumbing commands to write the result in the named
file, to prevent stupid mistakes.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas. This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.
We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal. Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.
On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.
We evict OBJ_BLOB from the cache first (credit goes to Junio) as
we want to favor OBJ_TREE within the cache. These are the objects
that have the highest inflate() startup penalty, as they tend to
be small and thus don't have that much of a chance to ammortize
that penalty over the entire data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/run-command:
Use run_command within send-pack
Use run_command within receive-pack to invoke index-pack
Use run_command within merge-index
Use run_command for proxy connections
Use RUN_GIT_CMD to run push backends
Correct new compiler warnings in builtin-revert
Replace fork_with_pipe in bundle with run_command
Teach run-command to redirect stdout to /dev/null
Teach run-command about stdout redirection
Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.
To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.
And index_data is declared const too while at it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new builtin-revert code introduces a few new compiler errors
when I'm building with my stricter set of checks enabled in CFLAGS.
These all just stem from trying to store a constant string into
a non-const char*. Simple fix, make the variables const char*.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When accessing objects, we first look for them in packs that
are linked together in the reverse order of discovery.
Since younger packs tend to contain more recent objects, which
are more likely to be accessed often, and local packs tend to
contain objects more relevant to our specific projects, sort the
list of packs before starting to access them. In addition,
favoring local packs over the ones borrowed from alternates can
be a win when alternates are mounted on network file systems.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In order to track and build on top of a branch 'topic' you track from
your upstream repository, you often would end up doing this sequence:
git checkout -b mytopic origin/topic
git config --add branch.mytopic.remote origin
git config --add branch.mytopic.merge refs/heads/topic
This would first fork your own 'mytopic' branch from the 'topic'
branch you track from the 'origin' repository; then it would set up two
configuration variables so that 'git pull' without parameters does the
right thing while you are on your own 'mytopic' branch.
This commit adds a --track option to git-branch, so that "git
branch --track mytopic origin/topic" performs the latter two actions
when creating your 'mytopic' branch.
If the configuration variable branch.autosetupmerge is set to true, you
do not have to pass the --track option explicitly; further patches in
this series allow setting the variable with a "git remote add" option.
The configuration variable is off by default, and there is a --no-track
option to countermand it even if the variable is set.
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.
By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle. For most modern systems this is up
around 2^60 or higher.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
As we permit up to 2^32-1 objects in a single packfile we cannot
use a signed int to represent the object offset within a packfile,
after 2^31-1 objects we will start seeing negative indexes and
error out or compute bad addresses within the mmap'd index.
This is a minor cleanup that does not introduce any significant
logic changes. It is roach free.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We did not detect broken loose object files, either when
underlying inflate() signalled the breakage, nor inflate()
finished and we had garbage trailing at the end. We do better
now.
We also make unpack_sha1_file() a static function to
sha1_file.c, since it is not used by anybody outside.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.
This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system. A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).
Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows for keeping the common idiom which consists of using
negative values to signal error conditions by ensuring that the enum
will be a signed type.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, show_date() can print three different kinds of dates: normal,
relative and short (%Y-%m-%s) dates.
To achieve this, the "int relative" was changed to "enum date_mode
mode", which has three states: DATE_NORMAL, DATE_RELATIVE and
DATE_SHORT.
Since existing users of show_date() only call it with relative_date
being either 0 or 1, and DATE_NORMAL and DATE_RELATIVE having these
values, no behaviour is changed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/crlf:
Teach core.autocrlf to 'git apply'
t0020: add test for auto-crlf
Make AutoCRLF ternary variable.
Lazy man's auto-CRLF
* jc/apply-config:
t4119: test autocomputing -p<n> for traditional diff input.
git-apply: guess correct -p<n> value for non-git patches.
git-apply: notice "diff --git" patch again
Fix botched "leak fix"
t4119: add test for traditional patch and different p_value
apply: fix memory leak in prefix_one()
git-apply: require -p<n> when working in a subdirectory.
git-apply: do not lose cwd when run from a subdirectory.
Teach 'git apply' to look at $HOME/.gitconfig even outside of a repository
Teach 'git apply' to look at $GIT_DIR/config
When we do not trust executable bit from lstat(2), we copied
existing ce_mode bits without checking if the filesystem object
is a regular file (which is the only thing we apply the "trust
executable bit" business) nor if the blob in the index is a
regular file (otherwise, we should do the same as registering a
new regular file, which is to default non-executable).
Noticed by Johannes Sixt.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since "git log origin/master" uses dwim_log() to match
"refs/remotes/origin/master", it makes sense to do that for
"git log --reflog", too.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new interface allows an application to temporarily hash a
small number of objects and pretend that they are available in
the object store without actually writing them.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Back when only handful commands that created commit and tag were
the only users of committer identity information, it made sense
to explicitly call setup_ident() to pre-fill the default value
from the gecos information. But it is much simpler for programs
to make the call automatic when get_ident() is called these days,
since many more programs want to use the information when updating
the reflog.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The code that uses committer_info() in reflog can barf and die
whenever it is asked to update a ref. And I do not think
calling ignore_missing_committer_name() upfront like recent
receive-pack did in the aplication is a reasonable workaround.
What the patch does.
- git_committer_info() takes one parameter. It used to be "if
this is true, then die() if the name is not available due to
bad GECOS, otherwise issue a warning once but leave the name
empty". The reason was because we wanted to prevent bad
commits from being made by git-commit-tree (and its
callers). The value 0 is only used by "git var -l".
Now it takes -1, 0 or 1. When set to -1, it does not
complain but uses the pw->pw_name when name is not
available. Existing 0 and 1 values mean the same thing as
they used to mean before. 0 means issue warnings and leave
it empty, 1 means barf and die.
- ignore_missing_committer_name() and its existing caller
(receive-pack, to set the reflog) have been removed.
- git-format-patch, to come up with the phoney message ID when
asked to thread, now passes -1 to git_committer_info(). This
codepath uses only the e-mail part, ignoring the name. It
used to barf and die. The other call in the same program
when asked to add signed-off-by line based on committer
identity still passes 1 to make sure it barfs instead of
adding a bogus s-o-b line.
- log_ref_write in refs.c, to come up with the name to record
who initiated the ref update in the reflog, passes -1. It
used to barf and die.
The last change means that git-update-ref, git-branch, and
commit walker backends can now be used in a repository with
reflog by somebody who does not have the user identity required
to make a commit. They all used to barf and die.
I've run tests and all of them seem to pass, and also tried "git
clone" as a user whose GECOS is empty -- git clone works again
now (it was broken when reflog was enabled by default).
But this definitely needs extra sets of eyeballs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
For example, it makes no sense to check the presence of a file
named "HEAD" when calling "git log HEAD" in a bare repository.
Noticed by Han-Wen Nienhuys.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Originally I introduced read_or_die for the purpose of reading
the pack header and trailer, and I was too lazy to print proper
error messages.
Linus Torvalds <torvalds@osdl.org>:
> For a read error, at the very least you have to say WHICH FILE
> couldn't be read, because it's usually a matter of some file just
> being too short, not some system-wide problem.
and of course Linus is right. Make it so.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/bare:
Disallow working directory commands in a bare repository.
git-fetch: allow updating the current branch in a bare repository.
Introduce is_bare_repository() and core.bare configuration variable
Move initialization of log_all_ref_updates
* jc/detached-head:
git-checkout: handle local changes sanely when detaching HEAD
git-checkout: safety check for detached HEAD checks existing refs
git-checkout: fix branch name output from the command
git-checkout: safety when coming back from the detached HEAD state.
git-checkout: rewording comments regarding detached HEAD.
git-checkout: do not warn detaching HEAD when it is already detached.
Detached HEAD (experimental)
git-branch: show detached HEAD
git-status: show detached HEAD
We have a number of badly checked read() calls. Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads. Add a read_in_full()
providing those semantics. Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We recently introduced a write_in_full() which would either write
the specified object or emit an error message and fail. In order
to fix the read side we now want to introduce a read_in_full()
but without an error emit. This patch cleans up the naming
of this family of calls:
1) convert the existing write_or_whine() to write_or_whine_pipe()
to better indicate its pipe specific nature,
2) convert the existing write_in_full() calls to write_or_whine()
to better indicate its nature,
3) introduce a write_in_full() providing a write or fail semantic,
and
4) convert write_or_whine() and write_or_whine_pipe() to use
write_in_full().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows "git checkout v1.4.3" to dissociate the HEAD of
repository from any branch. After this point, "git branch"
starts reporting that you are not on any branch. You can go
back to an existing branch by saying "git checkout master", for
example.
This is still experimental. While I think it makes sense to
allow commits on top of detached HEAD, it is rather dangerous
unless you are careful in the current form. Next "git checkout
master" will obviously lose what you have done, so we might want
to require "git checkout -f" out of a detached HEAD if we find
that the HEAD commit is not an ancestor of any other branches.
There is no such safety valve implemented right now.
On the other hand, the reason the user did not start the ad-hoc
work on a new branch with "git checkout -b" was probably because
the work was of a throw-away nature, so the convenience of not
having that safety valve might be even better. The user, after
accumulating some commits on top of a detached HEAD, can always
create a new branch with "git checkout -b" not to lose useful
work done while the HEAD was detached.
We'll see.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This removes the old is_bare_git_dir(const char *) to ask if a
directory, if it is a GIT_DIR, is a bare repository, and
replaces it with is_bare_repository(void *). The function looks
at core.bare configuration variable if exists but uses the old
heuristics: if it is ".git" or ends with "/.git", then it does
not look like a bare repository, otherwise it does.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/mmap: (27 commits)
Spell default packedgitlimit slightly differently
Increase packedGit{Limit,WindowSize} on 64 bit systems.
Update packedGit config option documentation.
mmap: set FD_CLOEXEC for file descriptors we keep open for mmap()
pack-objects: fix use of use_pack().
Fix random segfaults in pack-objects.
Cleanup read_cache_from error handling.
Replace mmap with xmmap, better handling MAP_FAILED.
Release pack windows before reporting out of memory.
Default core.packdGitWindowSize to 1 MiB if NO_MMAP.
Test suite for sliding window mmap implementation.
Create pack_report() as a debugging aid.
Support unmapping windows on 'temporary' packfiles.
Improve error message when packfile mmap fails.
Ensure core.packedGitWindowSize cannot be less than 2 pages.
Load core configuration in git-verify-pack.
Fully activate the sliding window pack access.
Unmap individual windows rather than entire files.
Document why header parsing won't exceed a window.
Loop over pack_windows when inflating/accessing data.
...
Conflicts:
cache.h
pack-check.c
It was stupid to link the same element twice to lock_file_list
and end up in a loop, so we certainly need a fix.
But it is not like we are taking a lock on multiple files in
this case. It is just that we leave the linked element on the
list even after commit_lock_file() successfully removes the
cruft.
We cannot remove the list element in commit_lock_file(); if we
are interrupted in the middle of list manipulation, the call to
remove_lock_file_on_signal() will happen with a broken list
structure pointed by lock_file_list, which would cause the cruft
to remain, so not removing the list element is the right thing
to do. Instead we should be reusing the element already on the
list.
There is already a code for that in lock_file() function in
lockfile.c. The code checks lk->next and the element is linked
only when it is not already on the list -- which is incorrect
for the last element on the list (which has NULL in its next
field), but if you read the check as "is this element already on
the list?" it actually makes sense. We do not want to link it
on the list again, nor we would want to set up signal/atexit
over and over.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When passing the revisions list to pack-objects we do not check for
errors nor short writes. Introduce a new write_in_full which will
handle short writes and report errors to the caller. Use this to
short cut the send on failure, allowing us to wait for and report
the child in case the failure is its fault.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Much like the alloc_report() function can be useful to report on
object allocation statistics while debugging the new pack_report()
function can be useful to report on the behavior of the mmap window
code used for packfile access.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.
We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset. This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.
In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window. This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.
I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing. We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>