1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-16 06:03:44 +01:00
Commit graph

172 commits

Author SHA1 Message Date
Jeff King
13494ed14c correct cache_entry allocation
Most cache_entry structs are allocated by using the
cache_entry_size macro, which rounds the size of the struct
up to the nearest multiple of 8 bytes (presumably to avoid
memory fragmentation).

There is one exception: the special "conflict entry" is
allocated with an empty name, and so is explicitly given
just one extra byte to hold the NUL.

However, later code doesn't realize that this particular
struct has been allocated differently, and happily tries
reading and copying it based on the ce_size macro, which
assumes the 8-byte alignment.

This can lead to reading uninitalized data, though since
that data is simply padding, there shouldn't be any problem
as a result. Still, it makes sense to hold the padding
assumption so as not to surprise later maintainers.

This fixes valgrind errors in t1005, t3030, t4002, and
t4114.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-01 23:46:34 -07:00
Junio C Hamano
5521883490 checkout: do not lose staged removal
The logic to checkout a different commit implements the safety to never
lose user's local changes.  For example, switching from a commit to
another commit, when you have changed a path that is different between
them, need to merge your changes to the version from the switched-to
commit, which you may not necessarily be able to resolve easily.  By
default, "git checkout" refused to switch branches, to give you a chance
to stash your local changes (or use "-m" to merge, accepting the risks of
getting conflicts).

This safety, however, had one deliberate hole since early June 2005.  When
your local change was to remove a path (and optionally to stage that
removal), the command checked out the path from the switched-to commit
nevertheless.

This was to allow an initial checkout to happen smoothly (e.g. an initial
checkout is done by starting with an empty index and switching from the
commit at the HEAD to the same commit).  We can tighten the rule slightly
to allow this special case to pass, without losing sight of removal
explicitly done by the user, by noticing if the index is truly empty when
the operation begins.

For historical background, see:

    http://thread.gmane.org/gmane.comp.version-control.git/4641/focus=4646

This case is marked as *0* in the message, which both Linus and I said "it
feels somewhat wrong but otherwise we cannot start from an empty index".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-09 22:55:22 -07:00
Junio C Hamano
913e0e99b6 unpack_trees(): protect the handcrafted in-core index from read_cache()
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.

The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such.  The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().

This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.

A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion.  But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.

An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick.  In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.

The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 18:09:27 -07:00
Junio C Hamano
2e2b887d1c unpack_trees(): allow callers to differentiate worktree errors from merge errors
Instead of uniformly returning -1 on any error, this teaches
unpack_trees() to return -2 when the merge itself is Ok but worktree
refuses to get updated.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-29 17:35:21 -07:00
Junio C Hamano
8ccba008ee unpack-trees: allow Porcelain to give different error messages
The plumbing output is sacred as it is an API.  We _could_ change it if it
is broken in such a way that it cannot convey necessary information fully,
but we just do not _reword_ for the sake of rewording.  If somebody does
not like it, s/he is complaining too late.  S/he should have been here in
early May 2005 and make the language used by the API closer to what humans
read.  S/he wasn't here.  Too bad, and it is too late.

And people who complain should look at a bigger picture.  Look at what was
suggested by one of them and think for five seconds:

     $ git checkout mytopic
    -fatal: Entry 'frotz' not uptodate. Cannot merge.
    +fatal: Entry 'frotz' has local changes. Cannot merge.

If you do not see something wrong with this output, your brain has already
been rotten with use of git for too long a time.  Nobody asked us to
"merge" but why are we talking about "Cannot merge"?

This patch introduces a mechanism to allow Porcelains to specify messages
that are different from the ones that is given by the underlying plumbing
implementation of read-tree, so that we can reword the message Porcelains give
without disrupting the output from the plumbing.

    $ git-checkout pu
    error: You have local changes to 'Makefile'; cannot switch branches.

There are other places that ask unpack_trees() to n-way merge, detect
issues  and let it issue error message on its own, but I did this as a
demonstration and replaced only one message.

Yes I know about C99 structure initializers.  I'd love to use them but we
try to be nice to compilers without it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-19 19:30:13 -07:00
Linus Torvalds
c40641b77b Optimize symlink/directory detection
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.

Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.

The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.

This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.

[ This can - and should - probably be extended upon so that we eventually
  never do a bare 'lstat()' on any path entries at *all* when checking the
  index, but always check the full path carefully. Right now we do not
  generally check the whole path for all our normal quick index
  revalidation.

  We should also make sure that we're careful about all the invalidation,
  ie when we remove a link and replace it by a directory we should
  invalidate the symlink cache if it matches (and vice versa for the
  directory cache).

  But regardless, the basic function needs to be sane to do that. The old
  'has_symlink_leading_path()' was not capable enough - or indeed the code
  readable enough - to really do that sanely. So I'm pushing this as not
  just an optimization, but as a base for further work. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-10 18:16:31 -07:00
Linus Torvalds
1fa6ead492 Make unpack-tree update removed files before any updated files
This is immaterial on sane filesystems, but if you have a broken (aka
case-insensitive) filesystem, and the objective is to remove the file
'abc' and replace it with the file 'Abc', then we must make sure to do
the removal first.

Otherwise, you'd first update the file 'Abc' - which would just
overwrite the file 'abc' due to the broken case-insensitive filesystem -
and then remove file 'abc' - which would now brokenly remove the just
updated file 'Abc' on that broken filesystem.

By doing removals first, this won't happen.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:22:25 -07:00
Linus Torvalds
32260ad5db Make branch merging aware of underlying case-insensitive filsystems
If we find an unexpected file, see if that filename perhaps exists in a
case-insensitive way in the index, and whether the file matches that. If
so, ignore it as a known pre-existing file of a different name.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:22:25 -07:00
Linus Torvalds
cd2fef59ed Make hash_name_lookup able to do case-independent lookups
Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:22:25 -07:00
Linus Torvalds
df292c791a Make "index_name_exists()" return the cache_entry it found
This allows verify_absent() in unpack_trees() to use the hash chains
rather than looking it up using the binary search.

Perhaps more importantly, it's also going to be useful for the next phase,
where we actually start looking at the cache entry when we do
case-insensitive lookups and checking the result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:22:25 -07:00
Junio C Hamano
c4758d3c93 Fix read-tree not to discard errors
This fixes the issue identified with recently added tests to t1004

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-18 22:17:22 -07:00
Linus Torvalds
7f8ab8dc07 Don't update unchanged merge entries
In commit 34110cd4e3 ("Make 'unpack_trees()'
have a separate source and destination index") I introduced a really
stupid bug in that it would always add merged entries with the CE_UPDATE
flag set. That caused us to always re-write the file, even when it was
already up-to-date in the source index.

Not only is that really stupid from a performance angle, but more
importantly it's actively wrong: if we have dirty state in the tree when
we merge, overwriting it with the result of the merge will incorrectly
overwrite that dirty state.

This trivially fixes the problem - simply don't set the CE_UPDATE flag
when the merge result matches the old state.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-16 14:25:53 -07:00
Linus Torvalds
fac4b32887 Fix recent 'unpack_trees()'-related changes breaking 'git stash'
On Sat, 15 Mar 2008, SZEDER G?bor wrote:
>
> The testcase usually fails during the first 25 run, but sometimes it
> runs more than 100 times before failing.

Damn, this series has had more subtle issues than I ever expected.

'git stash' creates its saved working tree object with:

        # state of the working tree
        w_tree=$( (
                rm -f "$TMP-index" &&
                cp -p ${GIT_INDEX_FILE-"$GIT_DIR/index"} "$TMP-index" &&
                GIT_INDEX_FILE="$TMP-index" &&
                export GIT_INDEX_FILE &&
                git read-tree -m $i_tree &&
                git add -u &&
                git write-tree &&
                rm -f "$TMP-index"
        ) ) ||
                die "Cannot save the current worktree state"

which creates a new index file with the updates, and writes the tree from
that.

We have this logic where we compare the timestamp of the index with the
timestamp of the files and we then write them out "smudged" if they are
the same, and it basically depends on the fact that the date on the index
file is compared with the date encoded in the stat information itself.

And what is going on is:

 - we create a new index file with that "cp". We are careful to preserve
   the timestamps by using "-p", so this one should be all ok.

 - then we *update* that index by resetting it to the tree with git
   read-tree, but now we do *not* preserve the timestamp on this new copy
   any more, even though we copy over all the timestamps on the files that
   are indexed from the stat information!

Now, we always had that problem when re-writing the index, but we had this
clever workaround in the writing part: if the source had racily clean
entries, then when we wrote those out (and thus can't depend on the index
fiel timestamp showing that they are racily clean any more!), we would
smudge them when writing.

IOW, we handle this issue by having write_index() do this:

	for (i = 0; i < entries; i++) {
		...
		if (is_racy_timestamp(istate, ce))
			ce_smudge_racily_clean_entry(ce);
		..

when writing out entries. And that all took care of it, because now when
we wrote the new index, we'd change the timestamp on the index, yes, but
we'd smudge the entries we wrote out, so now the resulting index would
still show that file as not-up-to-date any more.

But with commit 34110cd4e3 ("Make
'unpack_trees()' have a separate source and destination index"), this
logic no longer triggers, because we now write out the "result" index, and
that one never got its timestamp updated from the source index, so it had
lost all that "is_racy_timestamp()" information!

This trivial patch fixes it. It looks trivial, and it's a simple fix, but
boy did it take me way too much thinking and explaining to myself to
explain why there was a problem in the first place!

The trivial fix is to just copy the index timestamp from the source index
into the result index. But we only do this if we *have* a source index, of
course, and if we will even bother to use the result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-14 23:35:55 -07:00
Junio C Hamano
ca885a4fe6 read-tree() and unpack_trees(): use consistent limit
read-tree -m can read up to MAX_TREES, which was arbitrarily set to 8 since
August 2007 (4 is needed to deal with 2 merge-base case).

However, the updated unpack_trees() code had an advertised limit of 4
(which it enforced).  In reality the code was prepared to take only 3
trees and giving 4 caused it to stomp on its stack.  Rename the MAX_TREES
constant to MAX_UNPACK_TREES, move it to the unpack-trees.h common header
file, and use it from both places to avoid future confusion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-13 23:56:36 -07:00
Linus Torvalds
20a16eb33e unpack_trees(): fix diff-index regression.
When skip_unmerged option is not given, unpack_trees() should not just
skip unmerged cache entries but keep them in the result for the caller to
sort them out.

For callers other than diff-index, the incoming index should never be
unmerged, but diff-index is a special case caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-10 23:51:13 -07:00
Junio C Hamano
542c264b01 traverse_trees_recursive(): propagate merge errors up
There were few places where merge errors detected deeper in the call chain
were ignored and not propagated up the callchain to the caller.

Most notably, this caused switching branches with "git checkout" to ignore
a path modified in a work tree are different between the HEAD version and
the commit being switched to, which it internally notices but ignores it,
resulting in an incorrect two-way merge and loss of the change in the work
tree.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-10 01:26:23 -07:00
Linus Torvalds
1caeacc1f2 unpack_trees(): minor memory leak fix in unused destination index
This adds a "discard_index(&o->result)" to the failure path, to reclaim
memory from an in-core index we built but ended up not using.

The *big* memory leak comes from the fact that we leak the cache_entry
things left and right. That's a very traditional and deliberate leak:
because we used to build up the cache entries by just mapping them
directly in from the index file (and we emulate that in modern times
by allocating them from one big array), we can't actually free them
one-by-one.

So doing the "discard_index()" will free the hash tables etc, which is
good, and it will free the "istate->alloc" but that is never set on the
result because we don't get the result from the index read. So we don't
actually free the individual cache entries themselves that got created
from the trees.

That's not something new, btw. We never did. But some day we should just
add a flag to the cache_entry() that it's a "free one by one" kind, and
then we could/should do it. In the meantime, this one-liner will fix
*some* of the memory leaks, but not that old traditional one.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 01:03:45 -08:00
Linus Torvalds
34110cd4e3 Make 'unpack_trees()' have a separate source and destination index
We will always unpack into our own internal index, but we will take the
source from wherever specified, and we will optionally write the result
to a specified index (optionally, because not everybody even _wants_ any
result: the index diffing really wants to just walk the tree and index
in parallel).

This ends up removing a fair number more lines than it adds, for the
simple reason that we can now skip all the crud that tried to be
oh-so-careful about maintaining our position in the index as we were
traversing and modifying it.  Since we don't actually modify the source
index any more, we can just update the 'o->pos' pointer without worrying
about whether an index entry got removed or replaced or added to.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 01:03:38 -08:00
Linus Torvalds
bc052d7f43 Make 'unpack_trees()' take the index to work on as an argument
This is just a very mechanical conversion, and makes everybody set it to
'&the_index' before calling, but at least it makes it more explicit
where we work with the index.

The next stage would be to split that index usage up into a 'source' and
a 'destination' index, so that we can unpack into a different index than
we started out from.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 00:43:48 -08:00
Linus Torvalds
01904572a5 Move 'unpack_trees()' over to 'traverse_trees()' interface
This not only deletes more code than it adds, it gets rid of a
singularly hard-to-understand function (unpack_trees_rec()), and
replaces it with a set of smaller and simpler functions that use the
generic tree traversal mechanism to walk over one or more git trees in
parallel.

It's still not the most wonderful interface, and by no means is the new
code easy to understand either, but it's at least a bit less opaque.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 00:43:47 -08:00
Junio C Hamano
5a4d707a6d Merge branch 'db/checkout'
* db/checkout: (21 commits)
  checkout: error out when index is unmerged even with -m
  checkout: show progress when checkout takes long time while switching branches
  Add merge-subtree back
  checkout: updates to tracking report
  builtin-checkout.c: Remove unused prefix arguments in switch_branches path
  checkout: work from a subdirectory
  checkout: tone down the "forked status" diagnostic messages
  Clean up reporting differences on branch switch
  builtin-checkout.c: fix possible usage segfault
  checkout: notice when the switched branch is behind or forked
  Build in checkout
  Move code to clean up after a branch change to branch.c
  Library function to check for unmerged index entries
  Use diff -u instead of diff in t7201
  Move create_branch into a library file
  Build-in merge-recursive
  Add "skip_unmerged" option to unpack_trees.
  Discard "deleted" cache entries after using them to update the working tree
  Send unpack-trees debugging output to stderr
  Add flag to make unpack_trees() not print errors.
  ...

Conflicts:

	Makefile
2008-02-27 12:53:26 -08:00
Linus Torvalds
e85486450e Be more verbose when checkout takes a long time
So I find it irritating when git thinks for a long time without telling me
what's taking so long. And by "long time" I definitely mean less than two
seconds, which is already way too long for me.

This hits me when doing a large pull and the checkout takes a long time,
or when just switching to another branch that is old and again checkout
takes a while.

Now, git read-tree already had support for the "-v" flag that does nice
updates about what's going on, but it was delayed by two seconds, and if
the thing had already done more than half by then it would be quiet even
after that, so in practice it meant that we migth be quiet for up to four
seconds. Much too long.

So this patch changes the timeout to just one second, which makes it much
more palatable to me.

The other thing this patch does is that "git checkout" now doesn't disable
the "-v" flag when doing its thing, and only disables the output when
given the -q flag.  When allowing "checkout -m" to fall back to a 3-way
merge, the users will see the error message from straight "checkout",
so we will tell them that we do fall back to make them look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-24 10:01:13 -08:00
Linus Torvalds
eb7a2f1d50 Use helper function for copying index entry information
We used to just memcpy() the index entry when we copied the stat() and
SHA1 hash information, which worked well enough back when the index
entry was just an exact bit-for-bit representation of the information on
disk.

However, these days we actually have various management information in
the cache entry too, and we should be careful to not overwrite it when
we copy the stat information from another index entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-22 21:24:47 -08:00
Junio C Hamano
987e315a6b Merge branch 'jc/gitignore-ends-with-slash'
* jc/gitignore-ends-with-slash:
  gitignore: lazily find dtype
  gitignore(5): Allow "foo/" in ignore list to match directory "foo"
2008-02-16 17:57:06 -08:00
Daniel Barkalow
4e7c4571b8 Add "skip_unmerged" option to unpack_trees.
This option allows the caller to reset everything that isn't unmerged,
leaving the unmerged things to be resolved. If, after a merge of
"working" and "HEAD", this is used with "HEAD" (reset, !update), the
result will be that all of the changes from "local" are in the working
tree but not added to the index (either with the index clean but
unchanged, or with the index unmerged, depending on whether there are
conflicts).

This will be used in checkout -m.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Daniel Barkalow
33ecf7eb61 Discard "deleted" cache entries after using them to update the working tree
Way back in read-tree.c, we used a mode 0 cache entry to indicate that
an entry had been deleted, so that the update code would remove the
working tree file, and we would just skip it when writing out the
index file afterward.

These days, unpack_trees is a library function, and it is still
leaving these entries in the active cache. Furthermore, unpack_trees
doesn't correctly ignore those entries, and who knows what other code
wouldn't expect them to be there, but just isn't yet called after a
call to unpack_trees. To avoid having other code trip over these
entries, have check_updates() remove them after it removes the working
tree files.

While we're at it, simplify the loop in check_updates(), and avoid
passing global variables as parameters to check_updates(): there is
only one call site anyway.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Daniel Barkalow
b05c6dff8a Send unpack-trees debugging output to stderr
This is to keep git-stash from getting confused if you're debugging
unpack-trees.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Daniel Barkalow
17e4642667 Add flag to make unpack_trees() not print errors.
(This applies only to errors where a plausible operation is impossible due
to the particular data, not to errors resulting from misuse of the merge
functions.)

This will allow builtin-checkout to suppress merge errors if it's
going to try more merging methods.

Additionally, if unpack_trees() returns with an error, but without
printing anything, it will roll back any changes to the index (by
rereading the index, currently). This obviously could be done by the
caller, but chances are that the caller would forget and debugging
this is difficult. Also, future implementations may give unpack_trees() a
more efficient way of undoing its changes than the caller could.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Daniel Barkalow
203a2fe117 Allow callers of unpack_trees() to handle failure
Return an error from unpack_trees() instead of calling die(), and exit
with an error in read-tree, builtin-commit, and diff-lib. merge-recursive
already expected an error return from unpack_trees, so it doesn't need to
be changed. The merge function can return negative to abort.

This will be used in builtin-checkout -m.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Junio C Hamano
6831a88ac0 gitignore: lazily find dtype
When we process "foo/" entries in gitignore files on a system
that does not have d_type member in "struct dirent", the earlier
implementation ran lstat(2) separately when matching with
entries that came from the command line, in-tree .gitignore
files, and $GIT_DIR/info/excludes file.

This optimizes it by delaying the lstat(2) call until it becomes
absolutely necessary.

The initial idea for this change was by Jeff King, but I
optimized it further to pass pointers to around.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05 00:46:49 -08:00
Junio C Hamano
d6b8fc303b gitignore(5): Allow "foo/" in ignore list to match directory "foo"
A pattern "foo/" in the exclude list did not match directory
"foo", but a pattern "foo" did.  This attempts to extend the
exclude mechanism so that it would while not matching a regular
file or a symbolic link "foo".  In order to differentiate a
directory and non directory, this passes down the type of path
being checked to excluded() function.

A downside is that the recursive directory walk may need to run
lstat(2) more often on systems whose "struct dirent" do not give
the type of the entry; earlier it did not have to do so for an
excluded path, but we now need to figure out if a path is a
directory before deciding to exclude it.  This is especially bad
because an idea similar to the earlier CE_UPTODATE optimization
to reduce number of lstat(2) calls would by definition not apply
to the codepaths involved, as (1) directories will not be
registered in the index, and (2) excluded paths will not be in
the index anyway.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05 00:46:49 -08:00
Linus Torvalds
7a51ed66f6 Make on-disk index representation separate from in-core one
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.

In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.

This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Linus Torvalds
f2fdd10ab7 unpack-trees: FLEX_ARRAY fix
In unpack-trees.c (line 593), we do

	..
	if (same(old, merge)) {
		*merge = *old;
	} else {
	..

and that "merge" is a cache_entry pointer. If we have a non-zero
FLEX_ARRAY size, it will cause us to copy the first few bytes of the
name too.

That is technically wrong even for FLEX_ARRAY being 1, but you'll never
notice, since the filenames should always be the same with the current
code.  But if we do the same thing for a rename, we'd be screwed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-18 01:10:24 -08:00
Junio C Hamano
63d285c849 per-directory-exclude: lazily read .gitignore files
Operations that walk directories or trees, which potentially need to
consult the .gitignore files, used to always try to open the .gitignore
file every time they entered a new directory, even when they ended up
not needing to call excluded() function to see if a path in the
directory is ignored.  This was done by push/pop exclude_per_directory()
functions that managed the data in a stack.

This changes the directory walking API to remove the need to call these
two functions.  Instead, the directory walk data structure caches the
data used by excluded() function the last time, and lazily reuses it as
much as possible.  Among the data the last check used, the ones from
deeper directories that the path we are checking is outside are
discarded, data from the common leading directories are reused, and then
the directories between the common directory and the directory the path
being checked is in are checked for .gitignore file.  This is very
similar to the way gitattributes are handled.

This API change also fixes "ls-files -c -i", which called excluded()
without setting up the gitignore data via the old push/pop functions.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-29 02:19:14 -08:00
Junio C Hamano
c78a24986d Merge branch 'jc/maint-add-sync-stat'
* jc/maint-add-sync-stat:
  t2200: test more cases of "add -u"
  git-add: make the entry stat-clean after re-adding the same contents
  ce_match_stat, run_diff_files: use symbolic constants for readability

Conflicts:

	builtin-add.c
2007-11-14 14:15:40 -08:00
Junio C Hamano
4bd5b7dacc ce_match_stat, run_diff_files: use symbolic constants for readability
ce_match_stat() can be told:

 (1) to ignore CE_VALID bit (used under "assume unchanged" mode)
     and perform the stat comparison anyway;

 (2) not to perform the contents comparison for racily clean
     entries and report mismatch of cached stat information;

using its "option" parameter.  Give them symbolic constants.

Similarly, run_diff_files() can be told not to report anything
on removed paths.  Also give it a symbolic constant for that.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-10 00:24:51 -08:00
Nicolas Pitre
4d4fcc5451 relax usage of the progress API
Since it is now OK to pass a null pointer to display_progress() and
stop_progress() resulting in a no-op, then we can simplify the code
and remove a bunch of lines by not making those calls conditional all
the time.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30 16:08:40 -07:00
Nicolas Pitre
dc6a0757c4 make struct progress an opaque type
This allows for better management of progress "object" existence,
as well as making the progress display implementation more independent
from its callers.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30 16:08:40 -07:00
Nicolas Pitre
42e18fbf5f more compact progress display
Each progress can be on a single line instead of two.

[sp: Changed "Checking files out" to "Checking out files" at
     Johannes Sixt's suggestion as it better explains the
	 action that is taking place]

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17 02:54:55 -04:00
Linus Torvalds
566b5c057c Optimize the three-way merge of git-read-tree
As mentioned, the three-way case *should* be as trivial as the
following. It passes all the tests, and I verified that a conflicting
merge in the 100,000 file horror-case merged correctly (with the conflict
markers) in 0.687 seconds with this, so it works, but I'm lazy and
somebody else should double-check it [jc: followed all three-way merge
codepaths and verified it removes when it should].

Without this patch, the merge took 8.355 seconds, so this patch
really does make a huge difference for merge performance with lots and
lots of files, and we're not talking percentages, we're talking
orders-of-magnitude differences!

Now "unpack_trees()" is just fast enough that we don't need to avoid it
(although it's probably still a good idea to eventually convert it to use
the traverse_trees() infrastructure some day - just to avoid having
extraneous tree traversal functions).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 23:02:14 -07:00
Linus Torvalds
d699676dda Optimize the two-way merge of git-read-tree too
This trivially optimizes the two-way merge case of git-read-tree too,
which affects switching branches.

When you have tons and tons of files in your repository, but there are
only small differences in the branches (maybe just a couple of files
changed), the biggest cost of the branch switching was actually just the
index calculations.

This fixes it (timings for switching between the "testing" and "master"
branches in the 100,000 file testing-repo-from-hell, where the branches
only differ in one small file).

Before:
	[torvalds@woody bummer]$ time git checkout master
	real    0m9.919s
	user    0m8.461s
	sys     0m0.264s

After:
	[torvalds@woody bummer]$ time git checkout testing
	real    0m0.576s
	user    0m0.348s
	sys     0m0.228s

so it's easily an order of magnitude different.

This concludes the series. I think we could/should do the three-way merge
too (to speed up merges), but I'm lazy. Somebody else can do it.

The rule is very simple: you need to remove the old entry if:
 - you want to remove the file entirely
 - you replace it with a "merge conflict" entry (ie a non-stage-0 entry)

and you can avoid removing it if you either

 - keep the old one
 - or resolve it to a new one.

and these rules should all be valid for the three-way case too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 14:00:25 -07:00
Linus Torvalds
288f072ec0 Optimize the common cases of git-read-tree
This optimizes bind_merge() and oneway_merge() to not unnecessarily
remove and re-add the old index entries when they can just get replaced
by updated ones.

This makes these operations much faster for large trees (where "large"
is in the 50,000+ file range), because we don't unnecessarily move index
entries around in the index array all the time.

Using the "bummer" tree (a test-tree with 100,000 files) we get:

Before:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m9.470s
	user    0m8.729s
	sys     0m0.476s

After:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m1.173s
	user    0m0.720s
	sys     0m0.452s

so for large trees this is easily very noticeable indeed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 14:00:11 -07:00
Linus Torvalds
b48d5a050a Move old index entry removal from "unpack_trees()" into the individual functions
This makes no changes to current code, but it allows the individual merge
functions to decide what to do about the old entry.  They might decide to
update it in place, rather than force them to always delete and re-add it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 13:59:19 -07:00
Linus Torvalds
933bf40a5c Start moving unpack-trees to "struct tree_desc"
This doesn't actually change any real code, but it changes the interface
to unpack_trees() to take an array of "struct tree_desc" entries, the same
way the tree-walk.c functions do.

The reason for this is that we would be much better off if we can do the
tree-unpacking using the generic "traverse_trees()" functionality instead
of having to the special "unpack" infrastructure.

This really is a pretty minimal diff, just to change the calling
convention. It passes all the tests, and looks sane. There were only two
users of "unpack_trees()": builtin-read-tree and merge-recursive, and I
tried to keep the changes minimal.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 02:30:44 -07:00
Junio C Hamano
936492d3cf unpack-trees.c: assume submodules are clean during check-out
Sven originally raised this issue:

    If you have a submodule checked out and you go back (or
    forward) to a revision of the supermodule that contains a
    different revision of the submodule and then switch to
    another revision, it will complain that the submodule is not
    uptodate, because git simply didn't update the submodule in
    the first move.

The current policy is to consider it is perfectly normal that
checked-out submodule is out-of-sync wrt the supermodule index.
At least until we introduce a superproject repository
configuration option that says "in this repository, I do care
about this submodule and at any time I move around in the
superproject, recursively check out the submodule to match", it
is a reasonable policy, as we currently do not recursively
checkout the submodules at all.  The most extreme case of this
policy is that the superproject index knows about the submodule
but the subdirectory does not even have to be checked out.

The function verify_uptodate(), called during the two-way merge
aka branch switching, is about "make sure the filesystem entity
that corresponds to this cache entry is up to date, lest we lose
the local modifications".  As we explicitly allow submodule
checkout to drift from the supermodule index entry, the check
should say "Ok, for submodules, not matching is the norm" for
now.

Later when we have the ability to mark "I care about this
submodule to be always in sync with the superproject" (thereby
implementing automatic recursive checkout and perhaps diff,
among other things), we should check if the submodule in
question is marked as such and perform the current test.

Acked-by: Lars Hjemli <hjemli@gmail.com>
Acked-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-05 10:55:55 -07:00
René Scharfe
1843d8d545 cleanup unpack-trees.c: shrink struct tree_entry_list
Remove the two write-only fields executable and symlink from struct
tree_entry_list.  Also replace usage of the field directory with
S_ISDIR checks on the mode field, and then remove this now obsolete
field, too.  Noticed by David Kastrup.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-24 17:28:10 -07:00
Sven Verdoolaege
0cf7375542 unpack-trees.c: assume submodules are clean during check-out
In particular, when moving back to a commit without a given submodule
and then moving back forward to a commit with the given submodule,
we shouldn't complain that updating would lose untracked file in
the submodule, because git currently does not checkout subprojects
during superproject check-out.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 17:01:00 -07:00
Junio C Hamano
ec0603e13c Teach read-tree 2-way merge to ignore intermediate symlinks
Earlier in 16a4c61, we taught "read-tree -m -u" not to be
confused when switching from a branch that has a path frotz/filfre
to another branch that has a symlink frotz that points at xyzzy/
directory.  The fix was incomplete in that it was still confused
when coming back (i.e. switching from a branch with frotz -> xyzzy/
to another branch with frotz/filfre).

This fix is rather expensive in that for a path that is created
we would need to see if any of the leading component of that
path exists as a symbolic link in the filesystem (in which case,
we know that path itself does not exist, and the fact we already
decided to check it out tells us that in the index we already
know that symbolic link is going away as there is no D/F
conflict).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-12 02:22:53 -07:00
Junio C Hamano
7df6ddf51e Merge branch 'maint-1.5.1' into maint
* maint-1.5.1:
  annotate: make it work from subdirectories.
  git-config: Correct asciidoc documentation for --int/--bool
  t1300: Add tests for git-config --bool --get
  unpack-trees.c: verify_uptodate: remove dead code
  Use PATH_MAX instead of TEMPFILE_PATH_LEN
  branch: fix segfault when resolving an invalid HEAD
2007-05-20 19:57:00 -07:00
Sven Verdoolaege
0a76f66524 unpack-trees.c: verify_uptodate: remove dead code
This code was killed by commit fcc387db9b.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-20 14:40:41 -07:00
Junio C Hamano
16a4c6176a read-tree -m -u: avoid getting confused by intermediate symlinks.
When switching from a branch with both x86_64/boot/Makefile and
i386/boot/Makefile to another branch that has x86_64/boot as a
symlink pointing at ../i386/boot, the code incorrectly removed
i386/boot/Makefile.

This was because we first removed everything under x86_64/boot
to make room to create a symbolic link x86_64/boot, then removed
x86_64/boot/Makefile which no longer exists but now is pointing
at i386/boot/Makefile, thanks to the symlink we just created.

This fixes it by using the has_symlink_leading_path() function
introduced previously for git-apply in the checkout codepath.
Earlier, "git checkout" was broken in t4122 test due to this
bug, and the test had an extra "git reset --hard" as a
workaround, which is removed because it is not needed anymore.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-11 22:33:31 -07:00
Nicolas Pitre
55a9137d8a delay progress display when checking out files
Let's start displaying progress only if more than 50% of total number
of files remains to be checked out after 2 seconds.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre
13aaf14825 make progress "title" part of the common progress interface
If the progress bar ends up in a box, better provide a title for it too.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre
96a02f8f6d common progress display support
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display.  If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.

Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Junio C Hamano
4c4caafc9c Treat D/F conflict entry more carefully in unpack-trees.c::threeway_merge()
This fixes three buglets in threeway_merge() regarding D/F
conflict entries.

* After finishing with path D and handling path D/F, some stages
  have D/F conflict entry which are obviously non-NULL.  For the
  purpose of determining if the path D/F is missing in the
  ancestor, they should not be taken into account.

* D/F conflict entry is a marker to say "this stage does _not_
  have the path", so do not send them to keep_entry().

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:55:51 -07:00
Junio C Hamano
ea4b52a86f t1000: fix case table.
Case #10 is not handled with unpack-trees.c:threeway_merge()
internally, unless under the agressive rule, and it is not a
bug.  As the test expects, ND (one side did not do anything,
other side deleted) case was meant to be handled by the caller's
policy (e.g. git-merge-one-file or git-merge-recursive).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:55:51 -07:00
Junio C Hamano
c81935348b Fix switching to a branch with D/F when current branch has file D.
This loosens the over-eager verify_absent() check that gets
upset to find directory D in the current working tree when
switching to a branch that has a file there.  The check needs to
make sure that we do not lose precious working tree files as a
result of removing directory D and replacing it with the file
from the other branch, which is a tad expensive but this is a
less common case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:25:10 -07:00
Junio C Hamano
b8ba1535bd Fix twoway_merge that passed d/f conflict marker to merged_entry().
When switching from one tree to another, we should not send a
marker that says "this file does not exist in the new tree -- I
am a placeholder to tell you that, and not a real blob" down to
merged_entry() as the result of the merge.
2007-04-04 00:19:29 -07:00
Junio C Hamano
9a4d8fdc25 unpack-trees: get rid of *indpos parameter.
This variable keeps track of which entry in the original index
the traversal is looking at, and belongs to the unpack_trees_options
structure along with other traversal status information.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:19:28 -07:00
Junio C Hamano
7f7932ab25 unpack_trees.c: pass unpack_trees_options structure to keep_entry() as well.
Other decision functions, deleted_entry() and merged_entry() take one as
their parameter, and this function should.  I'll be introducing a separate
index to build the result in, and am planning to pass it as the part of the
structure.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:19:28 -07:00
Linus Torvalds
6fda5e5180 Initialize tree descriptors with a helper function rather than by hand.
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.

Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21 10:21:57 -07:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano
f388cec3d7 Merge branch 'jc/read-tree-ignore'
* jc/read-tree-ignore:
  read-tree: document --exclude-per-directory
  Loosen "working file will be lost" check in Porcelain-ish
  read-tree: further loosen "working file will be lost" check.
2006-12-13 11:10:24 -08:00
Junio C Hamano
f8a9d42872 read-tree: further loosen "working file will be lost" check.
This follows up commit ed93b449 where we removed overcautious
"working file will be lost" check.

A new option "--exclude-per-directory=.gitignore" can be used to
tell the "git-read-tree" command that the user does not mind
losing contents in untracked files in the working tree, if they
need to be overwritten by a merge (either a two-way "switch
branches" merge, or a three-way merge).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-05 23:25:52 -08:00
Junio C Hamano
0fb1eaa885 unpack-trees: make sure "df_conflict_entry.name" is NUL terminated.
The structure that ends with a flexible array member (or 0
length array with older GCC) "char name[FLEX_ARRAY]" is
allocated on the stack and we use it after clearing its entire
size with memset.  That does not guarantee that "name" is
properly NUL terminated as we intended on platforms with more
forgiving structure alignment requirements.

Reported breakage on m68k by Roman Zippel.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-04 14:24:28 -08:00
Junio C Hamano
ed93b449c5 merge: loosen overcautious "working file will be lost" check.
The three-way merge complained unconditionally when a path that
does not exist in the index is involved in a merge when it
existed in the working tree.  If we are merging an old version
that had that path tracked, but the path is not tracked anymore,
and if we are merging that old version in, the result will be
that the path is not tracked.  In that case we should not
complain.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-27 17:16:39 -07:00
Shawn Pearce
e702496e43 Convert memcpy(a,b,20) to hashcpy(a,b).
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 13:53:10 -07:00
David Rientjes
a89fccd281 Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.
Introduces global inline:

	hashcmp(const unsigned char *sha1, const unsigned char *sha2)

Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).

Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-17 14:23:53 -07:00
David Rientjes
96f1e58f52 remove unnecessary initializations
[jc: I needed to hand merge the changes to the updated codebase,
 so the result needs to be checked.]

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 21:22:20 -07:00
David Rientjes
6f002f984f use appropriate typedefs
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 16:12:09 -07:00
Johannes Schindelin
076b0adcf9 read-tree: move merge functions to the library
This will allow merge-recursive to use the read-tree functionality
without exec()ing git-read-tree.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-30 23:31:39 -07:00
Johannes Schindelin
16da134b1f read-trees: refactor the unpack_trees() part
Basically, the options are passed by a struct unpack_trees_options now.
That's all.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-30 23:31:31 -07:00